Abstract

Singing voice detection or vocal detection is a classification task that determines whether there is a singing voice in a given audio segment. This process is a crucial preprocessing step that can be used to improve the performance of other tasks such as automatic lyrics alignment, singing melody transcription, singing voice separation, vocal melody extraction, and many more. This paper presents a survey on the techniques of singing voice detection with a deep focus on state-of-the-art algorithms such as convolutional LSTM and GRU-RNN. It illustrates a comparison between existing methods for singing voice detection, mainly based on the Jamendo and RWC datasets. Long-term recurrent convolutional networks have reached impressive results on public datasets. The main goal of the present paper is to investigate both classical and state-of-the-art approaches to singing voice detection.

Highlights

  • In this paper, we would like to fill this gap, and we investigate the classical approaches of singing voice detection (SVD) systems [13] which focus on the acoustic similarity between singing voice and speech, using cepstral coefficients [13] and linear predictive coding [14]

  • The authors found out that discrete Fourier transform (DFT) coefficients achieved higher detection accuracy evaluated on all epochs over the average of 10 trials which is higher than Mel-frequency cepstrum coefficients (MFCCs) and raw pulse-code modulation (PCM)

  • The results show that Long Short-Term Memory (LSTM)-recurrent neural network (RNN) outperforms all other methods in statistical benchmarks

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call