Automatic Label Calibration for Singing Annotation Using Fully Convolutional Neural Network

Xiao Fu,Jinglu Hu,Hangyu Deng

doi:10.1002/tee.23804

Abstract

Accurately‐labeled data is crucial for the training of machine learning models. For singing‐related tasks in the music information retrieval field, accurately‐labeled data is limited because annotating singing is time‐consuming. Several studies create vocal datasets using a two‐step annotation method which creates coarse labels first and then executes a manual calibration procedure. However, manually calibrating coarsely‐labeled singing data is expensive and time‐consuming. To address this problem, in this study we propose a singing‐label calibration framework, which aims to automatically calibrate the coarsely‐labeled singing data with higher accuracy. This framework contains a data augmentation method to generate training and testing data, a reasonable data preprocessing method to handle music audio and symbolic labels, a fully‐convolutional neural network to estimate the difference between coarse labels and accurate labels, and a novel calibration function to correct the coarse labels. Various experiments are conducted to examine the effect of our research. The results show that our model can highly reduce the cost time and slightly increase the labeling accuracy of the manual calibration process. © 2023 Institute of Electrical Engineers of Japan. Published by Wiley Periodicals LLC.

Full Text