Abstract

Most of the sentiment analysis studies focus on the sentimental classification of pictures in the video, ignoring the spatio-temporal information of the sequence of picture frames as well as text and audio information. The multiple kernel learning is a new hotspot in the field of nuclear machine learning, capable of handling multiple modalities. For multiple kernel learning, it is easy to ignore the basic features that are not discriminative, and cannot make full use of the base features of different modes. This paper puts forward a novel multi-modal fusion model for sentiment analysis, in which a multiple kernel learning algorithm based on convolution margin-dimension constraint is proposed for feature fusion. Moreover, the 3D convolutional neural network is used to extract the features of visual information, and the multiple kernel learning algorithm based on margin-dimension constraint is used to fuse visual, text and audio sentiment features. Experiments conducted on the MOUD and IEMOCAP sentiment databases show that the proposed model outperforms existing models in the field of multi-modal sentiment analysis research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.