Research on Feature Extraction and Multimodal Fusion of Video Caption Based on Deep Learning

Hongjun Chen,Xueqin Wu,Hengyi Li

doi:10.1145/3380625.3380669

Research on Feature Extraction and Multimodal Fusion of Video Caption Based on Deep Learning

Hongjun Chen, Xueqin Wu + Show 1 more

https://doi.org/10.1145/3380625.3380669

Copy DOI

Publication Date: Jan 17, 2020

Citations: 1

Affiliation: Sichuan University, Chengdu Medical College, University of Electronic Science and Technology of China

#Multi-modal Fusion Method #Multimodal Fusion + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Video Caption shows the objects, attributes and their relationship in natural language. It has been a very challenging research topic in the field of computer and multimedia. In this paper, the method of deep learning is used to extract the video frame feature, motion information, video sequence feature. And the multi-modal feature fusion method: feature cascade, model weighted average fusion are studied, and then the valuation is also studied. The experimental results show that the score of each evaluation in the model of weighted average fusion method is higher than that of the feature cascade method. The feature extraction methods and multimodal fusion methods in this paper have certain value for the application of video caption.

Full Text