Approach For Speech Recognition Research Articles

In this paper, an unsupervised data-driven robust speech recognition approach is proposed based on a joint feature vector normalization and acoustic model adaptation. Feature vector normalization reduces the acoustic mismatch between training and testing conditions by mapping the feature vectors towards the training space. Model adaptation modifies the parameters of the acoustic models to match the test space. However, since neither is optimal, both approaches use an intermediate space between training and testing spaces to map either the feature vectors or acoustic models. The joint optimization of both approaches provides a common intermediate space with a better match between normalized feature vectors and adapted acoustic models. In this paper, feature vector normalization is based on a minimum mean square error (MMSE) criterion. A class dependent multi-environment model linear normalization (CD-MEMLIN) based on two classes (silence/speech) with a cross probability model (CD-MEMLIN-CPM) is used. CD-MEMLIN-CPM assumes that each class of clean and noisy spaces can be modeled with a Gaussian mixture model (GMM), training a linear transformation for each pair of Gaussians in an unsupervised data-driven training process. This feature vector normalization maps the recognition space feature vector to a normalized space. The acoustic model adaptation maps the training space to the normalized space by defining a set of linear transformations over an expanded HMM-state space, compensating for those degradations that the feature vector normalization is not able to model, like rotations. Experiments have been carried out with the Spanish SpeechDat Car database and Aurora 2 databases using both the standard Mel-frequency cepstral coefficient (MFCC) and advanced ETSI front-ends. Consistent improvements were reached for both corpora and front-ends. Using the standard MFCC front-end, a 92.08% average improvement on WER for Spanish SpeechDat Car and a 69.75% average improvement for clean condition evaluation of Aurora 2 was obtained, improving those results reached with ETSI advanced front-end (83.28% and 67.41%, respectively). Using the ETSI advanced front-end with the proposed solution, a 75.47% average improvement was obtained for the clean condition evaluation of Aurora 2 database.

Read full abstract

Here we seek to understand the similarities and differences between two speech recognition approaches, namely the HMM/ANN hybrid and the posterior-based segmental model. Both these techniques create local posterior probability estimates and combine these estimates into global posteriors – but they are built on somewhat different concepts and mathematical derivations. The HMM/ANN hybrid combines the local estimates via a formulation that is inherited from the generative HMM concept, while the components of the segment-based model correspond quite directly to the two subtasks of phonetic decoding: segmentation and classification. In this paper we attempt to identify the corresponding components of the segmental model within the hybrid model, with the intent of gaining an insight from this unusual point of view. As regards one of these components, the segment-based phone posteriors, we show that the independence-based product rule combination applied in the hybrid produces strongly biased estimates. As for the other component, the segmentation probability factor, we argue that it is present in the hybrid thanks to the bias of the product rule – that is, the product rule goes wrong in such a special way that it helps the model find the best segmentation of the input. To prove this assertion, we combine this bias with the posterior estimates obtained by averaging, and find that the resulting ‘averaging hybrid’ slightly outperforms the standard one on a phone recognition task and a word recognition task as well. Overall we conclude that the contribution of the product rule to the decoding process is just as important for the segmentation subtask as it is for the segment classification subtask.

Read full abstract

Approach For Speech Recognition Research Articles

Articles published on Approach For Speech Recognition

Unsupervised Data-Driven Feature Vector Normalization With Acoustic Model Adaptation for Robust Speech Recognition

A Hybrid Acoustic and Pronunciation Model Adaptation Approach for Non-native Speech Recognition

Speech Recognition With Flat Direct Models

Hybrid models based on biological approaches for speech recognition

Improved Phoneme-Based Myoelectric Speech Recognition

Word and triphone based approaches in continuous speech recognition for Tamil language

A segment-based interpretation of HMM/ANN hybrids

Recognition of coded speech transmitted over wireless channels

Application of a modified neural fuzzy network and an improved genetic algorithm to speech recognition

Model-Based Feature Compensation for Robust Speech Recognition

Recognition of human speech phonemes using a novel fuzzy approach

Mask estimation for missing data speech recognition based on statistics of binaural interaction

Robust integration for speech features

Exploitation of Morphological Structures in Large Vocabulary Arabic Speech Recognition

Missing-feature approaches in speech recognition

Maximum likelihood sub-band adaptation for robust speech recognition

Product of Gaussians for speech recognition

Using Mel-Frequency Cepstral Coefficients in Missing Data Technique

Noise adaptive speech recognition based on sequential noise parameter estimation

Union: A new approach for combining sub-band observations for noisy speech recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Approach For Speech Recognition Research Articles

Articles published on Approach For Speech Recognition

Unsupervised Data-Driven Feature Vector Normalization With Acoustic Model Adaptation for Robust Speech Recognition

A Hybrid Acoustic and Pronunciation Model Adaptation Approach for Non-native Speech Recognition

Speech Recognition With Flat Direct Models

Hybrid models based on biological approaches for speech recognition

Improved Phoneme-Based Myoelectric Speech Recognition

Word and triphone based approaches in continuous speech recognition for Tamil language

A segment-based interpretation of HMM/ANN hybrids

Recognition of coded speech transmitted over wireless channels

Application of a modified neural fuzzy network and an improved genetic algorithm to speech recognition

Model-Based Feature Compensation for Robust Speech Recognition

Recognition of human speech phonemes using a novel fuzzy approach

Mask estimation for missing data speech recognition based on statistics of binaural interaction

Robust integration for speech features

Exploitation of Morphological Structures in Large Vocabulary Arabic Speech Recognition

Missing-feature approaches in speech recognition

Maximum likelihood sub-band adaptation for robust speech recognition

Product of Gaussians for speech recognition

Using Mel-Frequency Cepstral Coefficients in Missing Data Technique

Noise adaptive speech recognition based on sequential noise parameter estimation

Union: A new approach for combining sub-band observations for noisy speech recognition