Abstract
We report our findings on using MIDI files and audio features from MIDI, separately and combined together, for MIDI music genre classification. We use McKay and Fujinaga's 3-root and 9-leaf genre data set. In order to compute distances between MIDI pieces, we use normalized compression distance (NCD). NCD uses the compressed length of a string as an approximation to its Kolmogorov complexity and has previously been used for music genre and composer clustering. We convert the MIDI pieces to audio and then use the audio features to train different classifiers. MIDI and audio from MIDI classifiers alone achieve much smaller accuracies than those reported by McKay and Fujinaga who used not NCD but a number of domain-based MIDI features for their classification. Combining MIDI and audio from MIDI classifiers improves accuracy and gets closer to, but still worse, accuracies than McKay and Fujinaga's. The best root genre accuracies achieved using MIDI, audio, and combination of them are 0.75, 0.86, and 0.93, respectively, compared to 0.98 of McKay and Fujinaga. Successful classifier combination requires diversity of the base classifiers. We achieve diversity through using certain number of seconds of the MIDI file, different sample rates and sizes for the audio file, and different classification algorithms.
Highlights
The increase of the musical databases on the Internet and multimedia systems have brought a great demand for music information retrieval (MIR) applications and especially automatic analysis of the musical databases
We report our experiments with linear discriminant classifiers (LDC) which assume normal densities and k-nearest neighbor classifiers (KNN)
(vi) Mel-frequency cepstral coefficients (MFCC): MFCCs are well known for speech representation
Summary
The increase of the musical databases on the Internet and multimedia systems have brought a great demand for music information retrieval (MIR) applications and especially automatic analysis of the musical databases. [6, 7] have suggested using an approximation to Kolmogorov distance between two musical pieces as a mean to compute clusters of music They first process the MIDI representation of a music piece to turn it into a string from a finite alphabet. Acoustic music signals are represented using different audio formats, such as VAW, MP3, AAC, or OGG. We use our preprocessing method [16, 17] of MIDI files, compute NCD between them using complearn software (http://www.complearn.org), and k-nearest neighbour classifier to predict root and leaf genre of MIDI files.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.