Abstract

As an indispensable front-end system, it is crucial for voice activity detection (VAD) system to be robust in all kinds of conditions. In this paper, we propose a fusion model based VAD system. A supervised fusing strategy is introduced to improve system performance in diverse data domains. We evaluate our proposed system on development datasets of Public Safety Communications (PSC), Video Annotation for Speech Technologies (VAST) and Babel from NIST Open Speech Analytic Technologies 2019 (OpenSAT19), each of which has its own challenges for VAD systems. Experimental results show the robustness of our fusion model. Compared to the baseline system, our proposed system achieves better performance under the OpenSAT19 official evaluation metrics in all three datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.