Abstract

Singer recognition plays a vital role in music information retrieval systems. Most songs in the singer recognition system are mixed audios of music and voice. In contrast, there is a lack of labeled a cappella solo singing data suitable for singer recognition. Text-independent singer recognition systems successfully encode audio features such as voice pitch, intensity, and timbre to achieve good performance. Most such systems are trained and evaluated using data from music with accompaniment. However, due to the influence of background music, the performance of the singer recognition model was limited. Contrarily, a powerful singer identification system can be trained and evaluated using a cappella solo singing voices with a clear and broad range of qualities. There needs to be labeled clear singing data suitable for singer recognition research. To address this issue, we present Vocal92, a multivariate a cappella solo singing and speech audio dataset spanning around 146.73 hours sourced from volunteers. Furthermore, we use two current state-of-the-art models to construct the singer recognition baseline system. In experiments, the singer recognition model developed by a cappella solo singing data performs well in both single-mode and cross-modal verification data, significantly improving related works. The dataset is accessible to everyone at https://pan.baidu.com/s/1Pn62DHfal2OOZ_5JqgGBdQ with jnz5 as the validation code. For non-commercial use, the dataset will be available free of charge at the IEEE DataPort (after acceptance).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call