Abstract

In comparison with the single mode, the utilization of multi-mode information of text, video and audio could lead to more accurate sentiment analysis. GME-Dialogue-NET, a gated multi-modal sentiment analysis model, is raised for the multi-modal emotion prediction and sentiment analysis. The model judges whether the audio or video modal is the noise through GME (Gated Multi-modal Embedding, GME) and then accepts or refuses the modal information based on the judgement. The model uses the Attention Mechanism of context vector to allocate more attention to the context with greater relevance to the current sentence. GME-Dialogue-NET divides participants of the dialogue into speaker and listener to better capture the dependence between emotion and state. It raises that the fusion mechanism CPA (Circulant-Pairwise Attention, CPA) could pay effective attention with different degrees on different modals to attain more helpful emotional and sentimental representation and thus make emotion prediction and sentiment analysis. Compared with the current model, both the weighted accuracy and the F1 score of emotion prediction were improved, especially for the three emotions of sadness, anger and excitement. In the sentiment regression task, the comparison between GME-Dialogue-NET with current advanced model Multilogue-Net shows that MAE (Mean absolute error, MAE) of GME-Dialogue-NET reduces by 0.1 percentage and the Pearson Correlation Coefficient (R) of GME-Dialogue-NET rises by 0.11 percentage.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call