Abstract

With the acceleration of global integration, the demand for English instruction is increasingly rising. On the other hand, Chinese English learners struggle to learn spoken English due to the limited English learning environment and teaching conditions in China. The advancement of artificial intelligence technology and the advancement of language teaching and learning techniques have ushered in a new era of language learning and teaching. Deep learning technology makes it possible to solve this problem. Speech recognition and assessment technology are at the heart of language learning, and speech recognition technology is the foundation. Because of the complex changes in speech pronunciation, a large amount of speech signal data, the high dimension of speech characteristic parameters, and a large amount of speech recognition and evaluation computation, the large volume of speech signal processing requires higher requirements of hardware and software resources and algorithms. However, traditional speech recognition algorithms, such as dynamic time-warped algorithms, hidden Markov models, and artificial neural networks, have their advantages and disadvantages. They have encountered unprecedented bottlenecks, so it is difficult to improve their accuracy and speed. To solve these problems, this paper focuses on evaluating the multimedia teaching effect of college English. A multilevel residual convolutional neural network algorithm for oral English pronunciation recognition is proposed based on a deep convolutional neural network. The experiments show that our algorithm can assist learners in identifying inconsistencies between their pronunciation and standard pronunciation and correcting pronunciation errors, resulting in improved oral English learning performance.

Highlights

  • Introduction e demand forEnglish learning [1] in China is increasingly rising due to global integration and China’s increasing degree of internationalization. e tremendous Chinese pronunciation characteristics and the difference with English pronunciation with time and location constraints cause the lack of a domestic English learning environment

  • Artificial intelligence (AI)-enabled [3, 4] learning has solved this problem with the advancement of computer science [5] and technology [6] and improvements in language teaching [7, 8] and learning methods. is technology will disrupt the current language teaching and learning environment, allowing learners to learn independently and in any place

  • Based on the above observations, this paper focuses on evaluating college English multimedia teaching effects as the primary research content. erefore, in this paper, a multilevel residual convolutional neural network is proposed. e proposed scheme is used to recognize spoken English pronunciation based on the deep convolutional neural network [28,29,30,31,32]. e proposed algorithm has been tested, can help learners distinguish between their own and standard pronunciation, correct pronunciation errors, and improve the efficiency of oral English learning. e paper’s key contributions are as follows: (1) Based on a deep convolutional neural network, a multimedia-based English teaching [33, 34] impact evaluation model is proposed

Read more

Summary

Related Work

Different aspects of related work have been studied such as language recognition, speech processing, and feature extraction mechanisms. En, the characteristic parameters of the preprocessed speech signal are extracted. Where z is the pre-emphasis coefficient, which is taken as 0.9375 in this paper In this way, the result y(n) after preemphasis processing can be expressed by the input speech signal x(n) as follows: y(n) x(n) − zx(n − 1). It has too much information that interferes with the semantics due to the difference of the speakers, the loudness, and the length of the sound. It is not suitable for direct use in speech processing. Function parameters from the original voice signal must be extracted. Function parameters from the original voice signal must be extracted. e ideal voice function describes only semantic information, and the total amount of voice data is tiny

Methodology
Experiments and Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call