Abstract
With the rapid advancement of artificial intelligence technology, particularly within the sphere of adolescent education, a continual emergence of new challenges and opportunities is observed. The current educational system increasingly requires the automation of teaching activities detection and evaluation, offering fresh perspectives for enhancing the quality of adolescent education. Although large-scale models are receiving significant attention in educational research, their high demand for computational resources and limitations in specific applications constrain their widespread use in analyzing educational video content, especially when handling multimodal data. Current multimodal contrastive learning methods, which integrate video, audio, and text information, have achieved certain successes in video–text retrieval tasks. However, these methods typically employ simpler weighted fusion strategies and fail to avoid noise and information redundancy. Therefore, our study proposes a novel network framework, CLIP2TF, which includes an efficient audio–visual fusion encoder. It aims to dynamically interact and integrate visual and audio features, further enhancing the visual features that may be missing or insufficient in specific teaching scenarios while effectively reducing redundant information transfer during the modality fusion process. Through ablation experiments on the MSRVTT and MSVD datasets, we first demonstrate the effectiveness of CLIP2TF in video–text retrieval tasks. Subsequent tests on teaching video datasets further proves the applicability of the proposed method. This research not only showcases the potential of artificial intelligence in the automated assessment of teaching quality but also provides new directions for research in related fields studies.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.