Abstract
Keyword spotting (KWS) is one of the important systems on speech applications, such as data mining, call routing, call center, customer-controlled smartphone, smart home systems with voice control, etc. With the goals of researching some factors affecting the Vietnamese Keyword spotting system, we study the combination architecture of CNN (Convolutional Neural Networks)-RNN (Recurrent Neural Networks) on both clean and noise environments with 2 distance speaker cases: 1m and 2m. The obtained results show that the noise trained models are better performance than clean trained models in any (clean or noise) testing environment. The results in this far-field experiment suggest to us how to choose the suitable distance of the recording microphones to the speaker so that there is no redundancy of data with the contexts considered to be the same.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.