Abstract

Abstract This paper uses the BERT model for text feature extraction to capture semantic information more accurately. FBank feature extraction technique is applied for analyzing speech data to improve the accurate recognition of speech information. The 3D-CNN model is built and used to image feature extraction to obtain richer image information. The study combines long and short-term memory networks and attention mechanisms to enhance the interaction and integration between different modalities. The study results show that this multimodal teaching approach not only improves students’ motivation to learn, but also significantly enhances the classroom teaching effect. Especially in vocabulary teaching, using multimodal resources helps deepen students’ understanding and memorization of vocabulary. The model shows high accuracy in the empirical Analysis of pronunciation detection in noisy environments, especially after incorporating multimodal features. The study provides an effective multimodal fusion method for English-Chinese bilingual teaching, especially in improving students’ vocabulary learning efficiency. It has been proved that the multimodal teaching method can better mobilize students’ various senses and thus improve learning. Therefore, the methods and findings of this study are of great theoretical and practical significance to the field of bilingual education.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call