Basic Functionality¶

Basically all functionality is incorporated in the class MTCFeatureLoader. A MTCFeatureLoader object takes as source a .jsonl file, which is a text file (optionally gzipped) with on each line a json object representing a melody. A melody object contains metadata fields and several sequences of feature values.

Several .jsonl files are provided with the module:

MTC-ANN-2.0.1 - Feature sequences for the melodies of MTC-ANN-2.0.1.
MTC-FS-INST-2.0 - Feature sequences for the melodies of MTC-FS-INST-2.0.
ESSEN - Feature sequences for the melodies in the ESSEN Folksong Collection.

The MTCFeatureLoader can be initialized either with one of these, or with a user provided .jsonl or .jsonl.gz file:

from MTCFeatures import MTCFeatureLoader
fl = MTCFeatureLoader('MTC-ANN-2.0.1')
fl = MTCFeatureLoader('MTC-FS-INST-2.0')
fl = MTCFeatureLoader('../path/to/my/file.jsonl.gz')
fl = MTCFeatureLoader('/path/to/my/file.jsonl')

The MTCFeatureLoader class provides various functionalities:

Melody Filtering : select melodies according to given criteria
Feature selection : keep subset of features
Feature extraction : compute a new feature from existing features and add it to the object
Replace undefined feature values (null in json, None in Python) with sensible fall back values

Operations can be chained. All feature extractors, feature selectors, object filters, and NoneReplacer return an iterator over the resulting sequences. Each has a parameter seq_iter. If seq_iter is None (default), the .jsonl file is taken as data source and a new iterator is created. Otherwise, the provided iterator is taken as data source. A method, applyFilters is available which takes a list of filter names and applies these in the provided order.

The following filters are registered in class MTCFeatureLoader:

vocal : Only keep vocal melodies
instrumental : Only keep instrumental melodies
firstvoice : Only keep first voices/stanzas (i.e. identifier ending with _01)
ann_bgcorpus : Only keep melodies unrelated to MTC-ANN (only applicable to MTC-FS-INST)
labeled : Only keep melodies with a tune family label
unlabeled : Only keep melodies without a tune family label
afteryear(year) : Only keep melodies in sources dated later than year (year not included)
beforeyear(year) : Only keep melodies in sources dated before year (year not included)
betweenyears(year1, year2) : Only keep melodies in sources dated between year1 and year2 (both not included)
inOGL : Only keep melodies that are part of Onder de Groene Linde
inNLBIDs(id_list) : Only keep melodies with given identifiers in id_list
inTuneFamilies(tf_list) : Only keep melodies in given tune families in tf_list
inInstTest : Only keep melodies that are in cINST.
origin(location) : Only keep melodies if location occurs somewhere in the origin meta data field (only for Essen).

Available as separate functions:

minClassSizeFilter : Keep only melodies in tune families with >= minsize members.
maxClassSizeFilter : Keep only melodies in tune families with <= maxsize members.
head : Keep only first n melodies.
tail : Keep only last n melodies.
randomSel : Take a random sample of n melodies.
replaceNone : Replace undefined feature values (None) with sensible fall back values.

For replacement of the None values, a separate rule is included for each of the relevant features. The following rules are included:

metriccontour: None -> ‘=’ if all items in the sequence are None. None -> ‘+’ if only the first item is None.
imacontour: First note: None -> “+”
contour3: First note: None -> “=”
contour5: First note: None -> “=”
IOR: First, and possibly last notes: None -> 1.0
IOR_frac: First, and possibly last notes: None -> “1”
durationcontour: First note: None -> “=”
restduration_frac: None -> “0”
diatonicinterval: First note: None -> 0
chromaticinterval: First note: None -> 0
nextisrest: Last note: None -> True
beatfraction: None -> “0”
beatinsong: None -> “0”
beatinphrase: None -> “0”
beatinphrase_end: None -> “0”
beatstrength: None -> 1.0
beat_str: None -> “1”
beat_fraction_str: None -> “0”
beat: None -> 0.0
timesignature: None -> “0/0”
lyrics: None -> “”
noncontentword: None -> False
wordend: None -> False
phoneme: None -> ‘’
rhymes: None -> False
rhymescontentwords: None -> False
wordstress: None -> False

For the different models from the literature (LBDM, GTTM, IR) no None-replacers are included.