Abstract
A new log-power domain feature enhancement algorithm named NLPS is developed. It consists of two parts, direct solution of nonlinear system model and log-power subtraction. In contrast to other methods, the proposed algorithm does not need prior speech/noise statistical model. Instead, it works by direct solution of the nonlinear function derived from the speech recognition system. Separate steps are utilized to refine the accuracy of estimated cepstrum by log-power subtraction, which is the second part of the proposed algorithm. The proposed algorithm manages to solve the speech probability distribution function (PDF) discontinuity problem caused by traditional spectral subtraction series algorithms. The effectiveness of the proposed filter is extensively compared using the standard database, AURORA2. The results show that significant improvement can be achieved by incorporating the proposed algorithm. The proposed algorithm reaches a recognition rate of over 86% for noisy speech (average from SNR 0 dB to 20 dB), which means a 48% error reduction over the baseline Mel-frequency Cepstral Coefficient (MFCC) system.
Highlights
The main objective of speech recognition is to get a higher recognition rate
Comparison is made against Mel-frequency Cepstral Coefficient (MFCC), minimum mean square error (MMSE)-short-time spectral amplitude (STSA) [3], Spectral Subtraction (SS) [8], Cepstral Mean Variance Normalization (CMVN), AFE [9], and Mean Variance Normalization and ARMA filtering (MVA) [10]
A novel algorithm for robust speech recognition system is presented with its detailed derivation, implementation, and evaluation
Summary
The main objective of speech recognition is to get a higher recognition rate. lots of factors tend to degrade the performance of automatic speech recognition (ASR) system, such as environmental noise, channel distortion, and speaker variability [1, 2]. Noise reduction or clean speech estimation is a straight forward “feature” approach to improve the performance of ASR systems. Ephraim derived the short-time spectral amplitude (STSA) estimator using minimum mean square error (MMSE) in 1984 [3], which has become a standard approach for clean speech estimation in speech processing. The success of MMSE in previous implementation reveals that it is one of the means to improve the performance of an automatic speech recognition system (ASR). It is not necessarily the only one.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.