Abstract

One way to improve automatic speech recognition (ASR) performance on the latest mobile devices, which can be employed on a variety of noisy environments, consists of taking advantage of the small microphone arrays embedded in them. Since the performance of the classic beamforming techniques with small microphone arrays is rather limited, specific techniques are being developed to efficiently exploit this novel feature for noise-robust ASR purposes. In this study, a novel dual-channel minimum mean square error-based feature compensation method relying on a vector Taylor series (VTS) expansion of a dual-channel speech distortion model is proposed. In contrast to the single-channel VTS approach (which can be considered as the state-of-the-art for feature compensation), the authors’ technique particularly benefits from the spatial properties of speech and noise. Their proposal is assessed on a dual-microphone smartphone (a particular case of interest) by means of the AURORA2-2C synthetic corpus. Word recognition results, also validated with real noisy speech data, demonstrate the higher accuracy of their method by clearly outperforming minimum variance distortionless response beamforming and a single-channel VTS feature compensation approach, especially at low signal-to-noise ratios.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.