Abstract

ABSTRACTIn this work, we present an ensemble for automated music genre classification that fuses acoustic and visual (both handcrafted and nonhandcrafted) features extracted from audio files. These features are evaluated, compared and fused in a final ensemble shown to produce better classification accuracy than other state-of-the-art approaches on the Latin Music Database, ISMIR 2004, and the GTZAN genre collection. To the best of our knowledge, this paper reports the largest test comparing the combination of different descriptors (including a wavelet convolutional scattering network, which has been tested here for the first time as an input for texture descriptors) and different matrix representations. Superior performance is obtained without ad hoc parameter optimisation; that is to say, the same ensemble of classifiers and parameter settings are used on all tested data-sets. To demonstrate generalisability, our approach is also assessed on the tasks of bird species recognition using vocalisation and whale detection data-sets. All MATLAB source code is available.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.