Abstract

The development of a real-time automatic speech recognition system (ASR) better adapted to environmental variabilities, such as noisy surroundings, speaker variations and accents has become a high priority. Robustness is required, and it can be performed at the feature extraction stage which avoids the need for other pre-processing steps. In this paper, a new robust feature extraction method for real-time ASR system is presented. A combination of Mel-frequency cepstral coefficients (MFCC) and discrete wavelet transform (DWT) is proposed. This hybrid system can conserve more extracted speech features which tend to be invariant to noise. The main idea is to extract MFCC features by denoising the obtained coefficients in the wavelet domain by using a median filter (MF). The proposed system has been implemented on Raspberry Pi 3 which is a suitable platform for real-time requirements. The experiments showed a high recognition rate (100%) in clean environment and satisfying results (ranging from 80% to 100%) in noisy environments at different signal to noise ratios (SNRs).

Highlights

  • Speech recognition technology has been widely spread and has been applied in many research areas such as mobile robots [1-33], consumer electronics [4], car audio systems [5], security system manipulators [6], and manipulators in industrial assembly lines [7]

  • To prove the performance of our proposed speech recognition algorithm based on median filter (MF)-discrete wavelet transform (DWT)/Mel-frequency cepstral coefficients (MFCC), we compared it through the use of several types of features such as MFCC, DWT/MFCC and MF-MFCC based on the two multiclass approaches, OAA and OAO The recognition experiments were performed using noisy testing data with different various noisy conditions: white Gaussian and babble noise, with a noise ratio (SNR) ranging from -10db to 10db

  • A comparatives study between feature extraction (MFCC, MFMCC, DWT-MFCC and MFDWT-MFCC) methods in babble and white noise states under different signal to noise ratios (SNRs) is summarized in Tables IV and V

Read more

Summary

INTRODUCTION

Speech recognition technology has been widely spread and has been applied in many research areas such as mobile robots [1-33], consumer electronics [4], car audio systems [5], security system manipulators [6], and manipulators in industrial assembly lines [7]. The speech feature extraction methods have been discussed, such as the linear prediction coefficients (LPCs), the relative spectral-perceptual linear prediction (RASTA-PLP) [14], and the linear-predictive cepstral coefficients (LPCCs) which have been used because of their efficiency and simplicity in speech and speaker recognition [15]. In spite of its good performance in clean background condition, the MFCC’s feature extraction for speech recognition is weak in noisy environments. The proposed method provides excellent recognition rate under clean and noisy states This can conserve more speech signal features which will be robust against noise effects This can conserve more speech signal features which will be robust against noisy effects. The proposed method has outstanding performance in clean and noisy environments

PROPOSED ALGORITHM METHODOLOGY
Feature Extraction
Classification
REAL TIME IMPLEMENTATION
TESTS AND RESULTS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call