Abstract

An automatic speech recognition (ASR) is commonly used in these days. Current ASR systems perform well in ideal environment; however, it does not perform well in realistic noisy environment. As a robust ASR, ETSI has standardized Advanced Front-End (AFE) that adopts two-stage of iterative Wiener filter (IWF) to realize a speech enhancement as the front-end of ASR. In the ETSI AFE, 16 kHz speech is divided uniformly by QMF (Quadrature Mirror Filter) into lower-band and higher-band signal and only lower-band signal is emphasized by the IWF and MFCC (Mel Frequency Cepstral Coefficient) is extracted from both the second-stage of emphasized lower-band signal and non-emphasized higher-band signal. FFT is used to estimate speech spectrum that designs the Wiener filter. On the other hand, we have already proposed robust complex speech analysis for an analytic signal. It can estimate more robust and more accurate speech spectrum due to the introduced robust criterion and nature of analytic signal. This paper proposes an improved AFE using wide-band robust ELS (Extended Least Square) complex analysis and real-valued analysis instead of FFT. Moreover, the QMF synthesis is also introduced for second-stage of wide-band analysis. The experimental results using the CENSREC-2 speech database demonstrate that the performance is improved.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.