Abstract

In this study, a robust speech activity detection (SAD) system consisting of a neural network classifier and a speaker diarization module that works under noisy conditions is designed. The performance of baseline SAD systems degrades significantly under noisy conditions. Under these conditions, rather than baseline features and classification methods, a neural network classifier using more advanced features can detect speech activity more successfully. Besides, it is seen that speaker diarization can be used in speech activity detection problem in a more advantageous way, and finally a two-stage speech activity detection system is designed. The system has a robust performance in real-life office recordings in which baseline speech activity detection systems perform poorly.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.