Abstract

Recent research on speech segregation and music fingerprinting has led to improvements in speech segregation and music identification algorithms. Speech and music segregation generally involves the identification of music followed by speech segregation. However, music segregation becomes a challenging task in the presence of noise. This paper proposes a novel method of speech segregation for unlabelled stationary noisy audio signals using the deep belief network (DBN) model. The proposed method successfully segregates a music signal from noisy audio streams. A recurrent neural network (RNN)-based hidden layer segregation model is applied to remove stationary noise. Dictionary-based fisher algorithms are employed for speech classification. The proposed method is tested on three datasets (TIMIT, MIR-1K, and MusicBrainz), and the results indicate the robustness of proposed method for speech segregation. The qualitative and quantitative analysis carried out on three datasets demonstrate the efficiency of the proposed method compared to the state-of-the-art speech segregation and classification-based methods.

Highlights

  • The rapid growth of open-source multimedia content in the past few decades demands the development of efficient audio and visual content analysis techniques

  • This paper presents a novel model for speech segregation using a noisy audio sample

  • While audio speech segregation algorithms are currently used in many applications, speech segregation from an audio signal in the presence of background white and pink noise is a challenging task due to environmental and noisy factors that mislead the contextual information required for audio segregation

Read more

Summary

Introduction

The rapid growth of open-source multimedia content in the past few decades demands the development of efficient audio and visual content analysis techniques. Speech segregation and recognition from audio visual content, available either online and offline, depends on the quality and content of the audio signal [1]. Available audio content can contain noise; musical segments can refer to the problem area during audio content analysis, especially in the case where speech segregation is needed. Significant research solutions have been found but the challenge remains. Noise garbles speech and introduces obstacles in various applications, including automatic speech segregation. Noise removal from audio speech signals enhance the accuracy of speech recognition and segregation applications [2]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.