Abstract
The performance of an automatic speech recognition (ASR) system is highly degraded in the presence of noise and reverberation. The autoregressive (AR) modeling approach, which preserves the high energy regions of the signal that are less susceptible to noise, first, presents a potential method for robust feature extraction. Second, there are strong correlations in the spectrotemporal domain of the speech signal, which are generally absent in noise. In this letter, we propose a novel method for speech feature extraction, which combines the advantages of AR approach and joint time-frequency processing using the multivariate AR modeling (MAR). Specifically, the subband discrete cosine transform coefficients obtained from multiple speech bands are used in the MAR framework to derive the Riesz temporal envelopes that provide features for ASR. We perform several speech recognition experiments in the Aurora-4 database with clean and multicondition training. In these experiments, the proposed features provide significant improvements over other noise robust feature extraction methods (relative improvements of 24% in clean training and 14% in multicondition training over mel features). Furthermore, the speech recognition experiments in REVERB challenge database illustrates the extension of the MAR modeling method for suppressing reverberant artifacts.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.