Fusion Feature Extraction Based on Auditory and Energy for Noise-Robust Speech Recognition

Yanyan Shi,Jing Bai,Dianxi Shi,Peiyun Xue

doi:10.1109/access.2019.2918147

Yanyan Shi, Jing Bai + Show 2 more

Open Access

PDF Available

https://doi.org/10.1109/access.2019.2918147

Copy DOI

Export

Save

Cite

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 13	License type: cc-by-nc-nd

Affiliation: Taiyuan University of Technology

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Environmental noise can pose a threat to the stable operation of current speech recognition systems. It is therefore essential to develop a front feature set that is able to identify speech under low signal-to-noise ratio. In this paper, a robust fusion feature is proposed that can fully characterize speech information. To obtain the cochlear filter cepstral coefficients (CFCC), a novel feature is first extracted by the power-law nonlinear function, which can simulate the auditory characteristics of the human ear. Speech enhancement technology is then introduced into the front end of feature extraction, and the extracted feature and their first-order difference are combined in new mixed features. An energy feature Teager energy operator cepstral coefficient (TEOCC) is also extracted, and combined with the above-mentioned mixed features to form the fusion feature sets. Principal component analysis (PCA) is then applied to feature selection and optimization of the feature set, and the final feature set is used in a non-specific persons, isolated words, and small-vocabulary speech recognition system. Finally, a comparative experiment of speech recognition is designed to verify the advantages of the proposed feature set using a support vector machine (SVM). The experimental results show that the proposed feature set not only display a high recognition rate and excellent anti-noise performance in speech recognition, but can also fully characterize the auditory and energy information in the speech signals.

Full Text