PureMIC: A New Audio Dataset for the Classification of Musical Instruments based on Convolutional Neural Networks

Gonçalo Castel-Branco,Fernando Perdigão,Gabriel Falcao

doi:10.1007/s11265-021-01661-3

Abstract

Automatic classification of musical instruments from audio relies heavily on datasets of acoustic recordings of the instruments to train models of those instruments. To do this, precise labels of the instrument’s events are mandatory. Also, it is very difficult to obtain such labels, especially in polyphonic performances. OpenMic-2018 is a polyphonic dataset created specifically with the aim to train instrument models. However, this dataset is based on weak and incomplete labels. The automatic classification of sound events, based on the VGGish bottleneck layer as proposed before by the AudioSet, implies the classification of only one second at a time, making it hard to find the label of that exact moment. To answer this question, this paper proposes PureMIC, a new strongly labeled dataset (SLD) that isolates 1000 single instrument clips manually labeled. Moreover, the proposed model classifies clips over time and also enhances the labeling robustness of a high number of unlabeled samples in OpenMIC-2018 due to its ability of classification over time. In the paper we disambiguate and report the automatic labeling of previously unlabeled samples. The proposed new labels achieve a mean average precision (mAP) of 0.701 for OpenMIC test data, outperforming its baseline (0.66). The code is released online so that the research community can replicate and follow the proposed implementation.

Full Text