Basic Functionality

Basically all functionality is incorporated in the class MTCFeatureLoader. A MTCFeatureLoader object takes as source a .jsonl file, which is a text file (optionally gzipped) with on each line a json object representing a melody. A melody object contains metadata fields and several sequences of feature values.

Several .jsonl files are provided with the module:

  • MTC-ANN-2.0.1 - Feature sequences for the melodies of MTC-ANN-2.0.1.

  • MTC-FS-INST-2.0 - Feature sequences for the melodies of MTC-FS-INST-2.0.

  • ESSEN - Feature sequences for the melodies in the ESSEN Folksong Collection.

The MTCFeatureLoader can be initialized either with one of these, or with a user provided .jsonl or .jsonl.gz file:

from MTCFeatures import MTCFeatureLoader
fl = MTCFeatureLoader('MTC-ANN-2.0.1')
fl = MTCFeatureLoader('MTC-FS-INST-2.0')
fl = MTCFeatureLoader('../path/to/my/file.jsonl.gz')
fl = MTCFeatureLoader('/path/to/my/file.jsonl')

The MTCFeatureLoader class provides various functionalities:

  • Melody Filtering : select melodies according to given criteria

  • Feature selection : keep subset of features

  • Feature extraction : compute a new feature from existing features and add it to the object

  • Replace undefined feature values (null in json, None in Python) with sensible fall back values

Operations can be chained. All feature extractors, feature selectors, object filters, and NoneReplacer return an iterator over the resulting sequences. Each has a parameter seq_iter. If seq_iter is None (default), the .jsonl file is taken as data source and a new iterator is created. Otherwise, the provided iterator is taken as data source. A method, applyFilters is available which takes a list of filter names and applies these in the provided order.

The following filters are registered in class MTCFeatureLoader:

  • vocal : Only keep vocal melodies

  • instrumental : Only keep instrumental melodies

  • firstvoice : Only keep first voices/stanzas (i.e. identifier ending with _01)

  • ann_bgcorpus : Only keep melodies unrelated to MTC-ANN (only applicable to MTC-FS-INST)

  • labeled : Only keep melodies with a tune family label

  • unlabeled : Only keep melodies without a tune family label

  • afteryear(year) : Only keep melodies in sources dated later than year (year not included)

  • beforeyear(year) : Only keep melodies in sources dated before year (year not included)

  • betweenyears(year1, year2) : Only keep melodies in sources dated between year1 and year2 (both not included)

  • inOGL : Only keep melodies that are part of Onder de Groene Linde

  • inNLBIDs(id_list) : Only keep melodies with given identifiers in id_list

  • inTuneFamilies(tf_list) : Only keep melodies in given tune families in tf_list

  • inInstTest : Only keep melodies that are in cINST.

  • origin(location) : Only keep melodies if location occurs somewhere in the origin meta data field (only for Essen).

Available as separate functions:

  • minClassSizeFilter : Keep only melodies in tune families with >= minsize members.

  • maxClassSizeFilter : Keep only melodies in tune families with <= maxsize members.

  • head : Keep only first n melodies.

  • tail : Keep only last n melodies.

  • randomSel : Take a random sample of n melodies.

  • replaceNone : Replace undefined feature values (None) with sensible fall back values.

For replacement of the None values, a separate rule is included for each of the relevant features. The following rules are included:

  • metriccontour: None -> ‘=’ if all items in the sequence are None. None -> ‘+’ if only the first item is None.

  • imacontour: First note: None -> “+”

  • contour3: First note: None -> “=”

  • contour5: First note: None -> “=”

  • IOR: First, and possibly last notes: None -> 1.0

  • IOR_frac: First, and possibly last notes: None -> “1”

  • durationcontour: First note: None -> “=”

  • restduration_frac: None -> “0”

  • diatonicinterval: First note: None -> 0

  • chromaticinterval: First note: None -> 0

  • nextisrest: Last note: None -> True

  • beatfraction: None -> “0”

  • beatinsong: None -> “0”

  • beatinphrase: None -> “0”

  • beatinphrase_end: None -> “0”

  • beatstrength: None -> 1.0

  • beat_str: None -> “1”

  • beat_fraction_str: None -> “0”

  • beat: None -> 0.0

  • timesignature: None -> “0/0”

  • lyrics: None -> “”

  • noncontentword: None -> False

  • wordend: None -> False

  • phoneme: None -> ‘’

  • rhymes: None -> False

  • rhymescontentwords: None -> False

  • wordstress: None -> False

For the different models from the literature (LBDM, GTTM, IR) no None-replacers are included.