Abstract

In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance degradation in noisy conditions or distorted channels. It is necessary to search for more robust feature extraction methods to gain better performance in adverse conditions. This paper investigates the performance of conventional and new hybrid speech feature extraction algorithms of Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coding Coefficient (LPCC), perceptual linear production (PLP), and RASTA-PLP in noisy conditions through using multivariate Hidden Markov Model (HMM) classifier. The behavior of the proposal system is evaluated using TIDIGIT human voice dataset corpora, recorded from 208 different adult speakers in both training and testing process. The theoretical basis for speech processing and classifier procedures were presented, and the recognition results were obtained based on word recognition rate.

Highlights

  • Automatic speech recognition (ASR) is an interactive system used to make the speech machine recognizable.How to cite this paper: Këpuska, V.Z. and Elharati, H.A. (2015) Robust Speech Recognition System Using Conventional and Hybrid Features of Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coding Coefficient (LPCC), Perceptual Linear Prediction (PLP), RASTA-PLP and Hidden Markov Model Classifier in Noisy Conditions

  • The focus of this study is to experimentally evaluate the effectiveness of noise on different conventional and hybrid feature extractions algorithm using MFCC, LPCC, PLP, and RASTA-PLP through using multivariate Hidden Markov Model (HMM) classifier and TIDIGIT speech corpora

  • The objective of this research is to evaluate the performance of four feature extraction techniques MFCC, LPCC, 100 95 90 85 80 75

Read more

Summary

Introduction

Automatic speech recognition (ASR) is an interactive system used to make the speech machine recognizable. (2015) Robust Speech Recognition System Using Conventional and Hybrid Features of MFCC, LPCC, PLP, RASTA-PLP and Hidden Markov Model Classifier in Noisy Conditions. The second part, the statistical modeling, known as back-end, is used to match these features with reference model to generate the recognition result using one templet or classifier techniques [1], such as Hidden Markov Models (HMMs), Artificial Neural Network (ANN), Dynamic Time Warping (DTW), or Vector Quantization (VQ). The performance of automatic speech recognition system based on acoustic model is totally dependent on the condition of training and testing data [2]. This means that the lack of noise robustness is the largely unsolved problem in automatic speech recognition research today.

Signal-to-Noise Ratio Estimation
Frame Blocking and Windowing
Speech Feature Extraction
RASTA-PLP
Statistical Modeling
Evaluation
Learning
Decoding
Results
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call