Abstract
It is well known that a keyword spotting (KWS) system provides significantly reduced performance in mismatched training and test conditions. In this work, an approach is proposed for reducing the mismatches between the training and test speech due to speaker-related variabilities and environmental noises. In the proposed approach, the variational-mode decomposition is first performed on the short-term magnitude spectra to decompose it into a number of variational mode functions (VMFs) in an adaptive manner. Then, a sufficiently smoothed spectra are reconstructed by selecting only two lower frequency VMFs. When the KWS system is developed by using Mel frequency cepstral coefficients (MFCCs) extracted from the smoothed spectra, a significantly improved performance is observed for pitch and noise mismatched test conditions. To further suppress the mismatches due to the pitch and speaking rate of the speakers, data-augmented training based on explicit prosody modification is performed. The experimental results presented in this study show that data-augmented training further enhances the performance of the developed KWS.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.