Abstract

Using computers to help people practice spoken language is a common method, but there are currently some problems. Firstly, because fluency feature is calculated depending on expert knowledge, the key information contained in the original data set may be lost. Secondly, optimize each model’s parameters separately to make the model’s performance in sub-optimal state. In order to solve these problems, a spoken English fluency scoring method based on convolutional neural network is proposed, in order to make the feature extraction consider the short-, medium-, and long-term characteristics of speech signal; three convolution layers are superimposed in this paper, which jointly learns feature extraction and scoring models from the original time-domain signal input. In the feature extraction process, we applied principal component analysis to make useful data extraction of audio features. The experimental results show that the scoring results of the proposed method are more accurate.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call