Abstract

Speech and singing voice discrimination is an important task in the speech processing area given that each type of voice requires different information retrieval and signal processing techniques. This discrimination task is hard even for humans depending on the length of voice segments. In this article, we present an automatic speech and singing voice classification method using pitch parameters derived from musical note information and f 0 stability analysis. We applied our method to a database containing speech and a capella singing and compared the results with other discrimination techniques based on information derived from pitch and spectral envelope. Our method obtains good results discriminating both voice types, is efficient, has good generalisation capabilities and is computationally fast. In the process, we have also created a note detection algorithm with parametric control of the characteristics of the notes it detects. We compared the agreement of this algorithm with a state-of-the-art note detection algorithm and performed an experiment that proves that speech and singing discrimination parameters can represent generic information about the music style of the singing voice.

Highlights

  • Discrimination of speech and singing is not an easy task even for humans, who need approximately one-second-long segments to discriminate singing and speaking voices with more than 95% accuracy [1]

  • We compared the agreement of this algorithm with a state-of-the-art note detection algorithm and performed an experiment that proves that speech and singing discrimination parameters can represent generic information about the music style of the singing voice

  • The pitch parameters we propose for the classification of each segment are: proportion of voiced segments (PV) and percentage of pitch labelled as a musical note (PN)

Read more

Summary

Introduction

Discrimination of speech and singing is not an easy task even for humans, who need approximately one-second-long segments to discriminate singing and speaking voices with more than 95% accuracy [1]. Rhythm patterns appear and repeated spoken segments are perceived as singing [2,3]. Even if singing and speech are closely related and difficult to be distinguished by humans, speech technologies developed for spoken speech are not directly applicable to singing speech. Many works have addressed the problem of speech and music discrimination [6,7,8,9], but these techniques are not directly applicable in the case of a capella singing because they exploit the presence of music

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call