Improved acoustic models for spontaneous speech recognition

Qingqing Zhang,Jielin Pan,Shang Cai,Yonghong Yan

doi:10.1121/1.4708075

Abstract

This paper describes advances for acoustic models in Chinese spontaneous Conversational Telephone Speech (CTS) recognition task. A number of approaches were investigated in the acoustic modeling, including Heteroscedastic Linear Discriminant Analysis (HLDA), Vocal Tract Length Normalization (VTLN), Gaussianization, Minimum Phone Error (MPE), Feature space MPE (fMPE), and etc. Considering pronunciation variations in continuous speech, tones in recognition vocabulary were modified due to the Sandhi rule. The acoustic models were trained on over 200 hours of audio data from standard LDC corpora. The improved acoustic models reduce the relative Character Error Rate (CER) by about 25% over the baseline acoustic models on standard LDC test set and China 863 program evaluation data set. Acknowledgment: This work is partially supported by the National Natural Science Foundation of China (No's. 10925419, 90920302, 10874203, 60875014, 61072124, 11074275, 11161140319).

Full Text