Abstract
Detecting music or speech signals in an audio mixture is an important but challenging problem. Even more challenging is detecting when both are present in a signal at the same time. This problem requires not only discriminating speech or music from each other but also detecting its presence in a mixture with interfering signals. In this paper, we address the problem of detecting speech and music signals in the presence of each other. We focus on leveraging features that capture the structural properties of audio to improve the performance of concurrent music-speech detection. Continuous Frequency Activation (CFA) is used to account for the sustained pitch/harmonic activities, and a new feature called Transient Activation (TAC) is proposed for the transient/percussive activities in an audio signal. The effectiveness of these features along with other acoustic features is evaluated in different statistical classification schemes. Feature selection is conducted to select the best feature set to maximize the detection performance. Experimental results on real world broadcast recordings have shown significant improvement by using the above techniques to incorporate the structural information of audio.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.