Speech Recognition of Oral English Teaching Based on Deep Belief Network

Jianmei Wang

doi:10.3991/ijet.v15i10.14041

Abstract

The oral English teaching faces several common problems: the teaching method is very inefficient, and the learners are poor in oral English. The development of computer-aided language learning offers a possible solution to these problems. Based on techniques of speech recognition, cloud computing and deep learning, this paper applies the deep belief network (DBN) to recognize the speeches in oral English teaching, and establishes a multi-parameter evaluation model for the pronunciation quality of oral English among college students. The model combines the merits of subjective and objective evaluations, and assesses the pronunciation from four aspects: pitch, speech rate, rhythm and intonation. Finally, the proposed model was verified through speech recognition and pronunciation evaluation experiments on 26 non-English majors from a college. The results show that the proposed evaluation model output credible results, which are consistent with those of experts, as evidenced by consistency, neighbourhood consistency and Pearson correlation coefficient. The research provides a feasible way to evaluate the oral English proficiency of learners, laying the basis for improving the teaching and learning efficiency of oral English.

Highlights

In the era of economic globalization, trade is booming across borders
(1) Pitch evaluation: In this study, the Mel frequency cepstrum coefficient (MFCC) coefficient was used as the evaluation standard of pitch, To be specific, it extracts the MFCC feature parameters of the test speech and the standard speech, and synthesizes the MFCC feature correlation coefficients with the deep belief network (DBN)-based speech recognition model to recognize the speech and evaluate the pitch of English learners
Speech recognition and pronunciation evaluation technology are the core of computer-aided language learning

Summary

Introduction

As an international language, has achieved unprecedented importance globally. Many try to learn oral English by listening and repeating of audio-visual materials on mobile phones and MP3 players. These devices cannot evaluate or instruct the learner’s pronunciations. Against this backdrop, many colleges around the world have organized oral communication internships and interactive language programs, and developed speech recognition/scoring techniques, providing learners great. Based on the above analysis, this paper briefly introduces relevant theories on speech signal pre-processing, feature extraction, and DL networks, and applies the deep belief network (DBN) to recognize the speeches in oral English teaching. A multi-parameter evaluation model was established for the pronunciation quality of oral English among college students, and verified through simulation experiments

Speech signal preprocessing and feature extraction

Deep learning and neural networks

Multi-parameter pronunciation quality evaluation

Simulation experiment and result analysis

Conclusion

Author