Abstract

Abstract In this paper, an improved semantic learning model under multimodal data fusion is proposed to model both image attention features and problem attention features using a network architecture of collaborative attention learning, which can effectively reduce irrelevant feature interference and extract more distinguishable features for image and problem representations. In addition, to address the high dimensionality as well as complex computational problems in the multimodal data fusion process, multimodal bilinear decomposition methods are utilized in order to achieve a more effective fusion of visual features in images and text features in questions to capture more complex interactions between multiple models. Compared with TF-IDF and TextRank, the accuracy rate of the model in this paper is 9.7% and 12.1% higher than them, respectively. The F=16.15, p<.05, for the first group of scores and the second group of scores for international students in Chinese language classes playing listening materials in a combination of audio, video, and Chinese characters. The F=8.527, p<.05, for the second group of scores and the third group of scores.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.