Basic Functionality¶
Basically all functionality is incorporated in the class MTCFeatureLoader. A MTCFeatureLoader object takes as source a .jsonl file, which is a text file (optionally gzipped) with on each line a json object representing a melody. A melody object contains metadata fields and several sequences of feature values.
Several .jsonl files are provided with the module:
MTC-ANN-2.0.1- Feature sequences for the melodies of MTC-ANN-2.0.1.MTC-FS-INST-2.0- Feature sequences for the melodies of MTC-FS-INST-2.0.ESSEN- Feature sequences for the melodies in the ESSEN Folksong Collection.
The MTCFeatureLoader can be initialized either with one of these, or with a user provided .jsonl or .jsonl.gz file:
from MTCFeatures import MTCFeatureLoader
fl = MTCFeatureLoader('MTC-ANN-2.0.1')
fl = MTCFeatureLoader('MTC-FS-INST-2.0')
fl = MTCFeatureLoader('../path/to/my/file.jsonl.gz')
fl = MTCFeatureLoader('/path/to/my/file.jsonl')
The MTCFeatureLoader class provides various functionalities:
Melody Filtering : select melodies according to given criteria
Feature selection : keep subset of features
Feature extraction : compute a new feature from existing features and add it to the object
Replace undefined feature values (
nullin json,Nonein Python) with sensible fall back values
Operations can be chained. All feature extractors, feature selectors, object filters, and NoneReplacer
return an iterator over the resulting sequences. Each has a parameter seq_iter. If seq_iter is None (default),
the .jsonl file is taken as data source and a new iterator is created. Otherwise, the provided iterator
is taken as data source. A method, applyFilters is available which takes a list of filter names and applies
these in the provided order.
The following filters are registered in class MTCFeatureLoader:
vocal: Only keep vocal melodiesinstrumental: Only keep instrumental melodiesfirstvoice: Only keep first voices/stanzas (i.e. identifier ending with _01)ann_bgcorpus: Only keep melodies unrelated to MTC-ANN (only applicable to MTC-FS-INST)labeled: Only keep melodies with a tune family labelunlabeled: Only keep melodies without a tune family labelafteryear(year): Only keep melodies in sources dated later than year (year not included)beforeyear(year): Only keep melodies in sources dated before year (year not included)betweenyears(year1, year2): Only keep melodies in sources dated between year1 and year2 (both not included)inOGL: Only keep melodies that are part of Onder de Groene LindeinNLBIDs(id_list): Only keep melodies with given identifiers in id_listinTuneFamilies(tf_list): Only keep melodies in given tune families in tf_listinInstTest: Only keep melodies that are in cINST.origin(location): Only keep melodies iflocationoccurs somewhere in the origin meta data field (only for Essen).
Available as separate functions:
minClassSizeFilter : Keep only melodies in tune families with >=
minsizemembers.maxClassSizeFilter : Keep only melodies in tune families with <=
maxsizemembers.head : Keep only first
nmelodies.tail : Keep only last
nmelodies.randomSel : Take a random sample of
nmelodies.replaceNone : Replace undefined feature values (
None) with sensible fall back values.
For replacement of the None values, a separate rule is included for each of the relevant features. The following rules are included:
metriccontour: None -> ‘=’ if all items in the sequence are None. None -> ‘+’ if only the first item is None.imacontour: First note: None -> “+”contour3: First note: None -> “=”contour5: First note: None -> “=”IOR: First, and possibly last notes: None -> 1.0IOR_frac: First, and possibly last notes: None -> “1”durationcontour: First note: None -> “=”restduration_frac: None -> “0”diatonicinterval: First note: None -> 0chromaticinterval: First note: None -> 0nextisrest: Last note: None -> Truebeatfraction: None -> “0”beatinsong: None -> “0”beatinphrase: None -> “0”beatinphrase_end: None -> “0”beatstrength: None -> 1.0beat_str: None -> “1”beat_fraction_str: None -> “0”beat: None -> 0.0timesignature: None -> “0/0”lyrics: None -> “”noncontentword: None -> Falsewordend: None -> Falsephoneme: None -> ‘’rhymes: None -> Falserhymescontentwords: None -> Falsewordstress: None -> False
For the different models from the literature (LBDM, GTTM, IR) no None-replacers are included.