Subspace-based Speech Enhancement Research Articles

In recent years, speech recognition technology has become a more common notion. Speech quality and intelligibility are critical for the convenience and accuracy of information transmission in speech recognition. The speech processing systems used to converse or store speech are usually designed for an environment without any background noise. However, in a real-world atmosphere, background intervention in the form of background noise and channel noise drastically reduces the performance of speech recognition systems, resulting in imprecise information transfer and exhausting the listener. When communication systems’ input or output signals are affected by noise, speech enhancement techniques try to improve their performance. To ensure the correctness of the text produced from speech, it is necessary to reduce the external noises involved in the speech audio. Reducing the external noise in audio is difficult as the speech can be of single, continuous or spontaneous words. In automatic speech recognition, there are various typical speech enhancement algorithms available that have gained considerable attention. However, these enhancement algorithms work well in simple and continuous audio signals only. Thus, in this study, a hybridized speech recognition algorithm to enhance the speech recognition accuracy is proposed. Non-linear spectral subtraction, a well-known speech enhancement algorithm, is optimized with the Hidden Markov Model and tested with 6660 medical speech transcription audio files and 1440 Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) audio files. The performance of the proposed model is compared with those of various typical speech enhancement algorithms, such as iterative signal enhancement algorithm, subspace-based speech enhancement, and non-linear spectral subtraction. The proposed cascaded hybrid algorithm was found to achieve a minimum word error rate of 9.5% and 7.6% for medical speech and RAVDESS speech, respectively. The cascading of the speech enhancement and speech-to-text conversion architectures results in higher accuracy for enhanced speech recognition. The evaluation results confirm the incorporation of the proposed method with real-time automatic speech recognition medical applications where the complexity of terms involved is high.

Read full abstract

The objective of this paper is threefold: (1) to provide an extensive review of signal subspace speech enhancement, (2) to derive an upper bound for the performance of these techniques, and (3) to present a comprehensive study of the potential of subspace filtering to increase the robustness of automatic speech recognisers against stationary additive noise distortions. Subspace filtering methods are based on the orthogonal decomposition of the noisy speech observation space into a signal subspace and a noise subspace. This decomposition is possible under the assumption of a low-rank model for speech, and on the availability of an estimate of the noise correlation matrix. We present an extensive overview of the available estimators, and derive a theoretical estimator to experimentally assess an upper bound to the performance that can be achieved by any subspace-based method. Automatic speech recognition (ASR) experiments with noisy data demonstrate that subspace-based speech enhancement can significantly increase the robustness of these systems in additive coloured noise environments. Optimal performance is obtained only if no explicit rank reduction of the noisy Hankel matrix is performed. Although this strategy might increase the level of the residual noise, it reduces the risk of removing essential signal information for the recogniser's back end. Finally, it is also shown that subspace filtering compares favourably to the well-known spectral subtraction technique.

Read full abstract

Subspace-based Speech Enhancement Research Articles

Related Topics

Articles published on Subspace-based Speech Enhancement

A Hybrid Speech Enhancement Algorithm for Voice Assistance Application.

Design and Implementation of Subspace-Based Speech Enhancement Under In-Car Noisy Environments

Critical Band Subspace-Based Speech Enhancement Using SNR and Auditory Masking Aware Technique

A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

Multiband subspace tracking speech enhancement for in-car human computer speech interaction

A perceptually motivated approach for speech enhancement

Efficient, high performance, subspace tracking for time-domain data

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Subspace-based Speech Enhancement Research Articles

Related Topics

Articles published on Subspace-based Speech Enhancement

A Hybrid Speech Enhancement Algorithm for Voice Assistance Application.

Design and Implementation of Subspace-Based Speech Enhancement Under In-Car Noisy Environments

Critical Band Subspace-Based Speech Enhancement Using SNR and Auditory Masking Aware Technique

A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

Multiband subspace tracking speech enhancement for in-car human computer speech interaction

A perceptually motivated approach for speech enhancement

Efficient, high performance, subspace tracking for time-domain data