Abstract
In recent decades, researchers have been focused on developing noise-robust methods in order to compensate for noise effects in automatic speech recognition (ASR) systems and enhance their performance. In this paper, we propose a feature-based noise-robust method that employs a novel data analysis technique—robust principal component analysis (RPCA). In the proposed scenario, RPCA is employed to process a noise-corrupted speech feature matrix, and the obtained sparse partition is shown to reveal speech-dominant characteristics. One apparent advantage of using RPCA for enhancing noise robustness is that no prior knowledge about the noise is required. The proposed RPCA-based method is evaluated with the Aurora-4 database and a task using a state-of-the-art deep neural network (DNN) architecture as the acoustic models. The evaluation results indicate that the newly proposed method can provide the original speech feature with significant recognition accuracy improvement, and can be cascaded with mean normalization (MN), mean and variance normalization (MVN), and relative spectral (RASTA)—three well-known and widely used feature robustness algorithms—to achieve better performance compared with the individual component method.
Highlights
Automatic speech recognition (ASR) applications have been widely seen in our daily life, and some examples include voice-based command controls of a robot, speech recognition using mobile devices and speech-related web search
We propose a feature-based method that uses the technique of robust principal component analysis (RPCA) [28,29] aiming to extract noise-robust speech features
filter-bank coefficients (FBANK) with deep neural network (DNN)-HMM outperforms mel-frequency cepstral coefficients (MFCC) with GMM-HMM by giving even lower word error rate (WER), which is in general attributed to the deep learning scheme in DNN
Summary
Automatic speech recognition (ASR) applications have been widely seen in our daily life, and some examples include voice-based command controls of a robot, speech recognition using mobile devices and speech-related web search. In [30], RPCA is applied to the spectrogram of speech signals, and the resulting sparse component is shown to contain less noise and be noise-robust Another speech enhancement method proposed in [31] first decomposes a speech signal into sub-bands via a wavelet transform and uses RPCA to extract the low-rank component of the matrix created by the overlapped frames of each sub-band signal, and the final output is the inverse wavelet transform of low-rank sub-band signals. We show that this new method can be additive to the prevalent mean normalization (MN) and relative spectral (RASTA) methods to further improve the recognition performance As a result, these evaluation results indicate that the newly proposed method is quite promising to enhance the ASR and can broaden the corresponding applications in real environments. For more details about the PCP method, one can refer to the literatures [28,29]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.