Abstract

AbstractThe availability of the humongous amount of multimodal content on the internet, the multimodal sentiment classification, and emotion detection has become the most researched topic. The feature selection, context extraction, and multi‐modal fusion are the most important challenges in multimodal sentiment classification and affective computing. To address these challenges this paper presents multilevel feature optimization and multimodal contextual fusion technique. The evolutionary computing based feature selection models extract a subset of features from multiple modalities. The contextual information between the neighboring utterances is extracted using bidirectional long‐short‐term‐memory at multiple levels. Initially, bimodal fusion is performed by fusing a combination of two unimodal modalities at a time and finally, trimodal fusion is performed by fusing all three modalities. The result of the proposed method is demonstrated using two publically available datasets such as CMU‐MOSI for sentiment classification and IEMOCAP for affective computing. Incorporating a subset of features and contextual information, the proposed model obtains better classification accuracy than the two standard baselines by over 3% and 6% in sentiment and emotion classification, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call