Abstract

Detecting music or speech signals in an audio mixture is an important but challenging problem. Even more challenging is detecting when both are present in a signal at the same time. This problem requires not only discriminating speech or music from each other but also detecting its presence in a mixture with interfering signals. In this paper, we address the problem of detecting speech and music signals in the presence of each other. We focus on leveraging features that capture the structural properties of audio to improve the performance of concurrent music-speech detection. Continuous Frequency Activation (CFA) is used to account for the sustained pitch/harmonic activities, and a new feature called Transient Activation (TAC) is proposed for the transient/percussive activities in an audio signal. The effectiveness of these features along with other acoustic features is evaluated in different statistical classification schemes. Feature selection is conducted to select the best feature set to maximize the detection performance. Experimental results on real world broadcast recordings have shown significant improvement by using the above techniques to incorporate the structural information of audio.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call