Abstract

Voice activity detection (VAD) determines whether the incoming signal segments are speech or noiseand is an important technique in almost all of speech-related applications. In order to improve VAD performance in various noise environments, characterizing the speech feature has been the most crucial issue up to date. Among several proposed speech features, the context information of speech through time and vowel sound characteristics are known to current state-of-the-art speech features. Therefore, in order to reflect both on these merits, we propose vowel based VAD by Long short term memory recurrent neural network (LSTM-RNN). LSTM-RNN is known to the powerful model to capture dynamical context information through time. Moreover, with teaching the LSTM-RNN to only vowel sounds rather than whole speech, LSTM-RNN can learn more effectively because of the reduced manifold of speech. According to our experiments, proposed method shows better accuracy not only in the VAD task compared to LSTM-RNN based VAD but alsoa vowel detection task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.