Abstract
• Advanced and efficient image-text multimodal fusion approach. • Clever use of matrix transformation to achieve alignment of different modal features. • Using attention mechanism to ensure the parallelism of the model and improve the training speed. • The unimodal and multimodal features are stitched together to ensure the complementarity of the modalities. • The pain point problem of difficult to obtain domain datasets is solved by constructing a generic dataset. Combining Artificial Intelligence (AI) to process rich media information has become an important part of Industry 4.0. Sentiment recognition in AI aims to analyze user emotions contained in rich media to facilitate service enhancement. Previous research on sentiment recognition has mainly focused on academia, and few have discussed algorithmic applications and innovations in industry. In this paper, we propose a general approach for multimodal sentiment recognition for images and text. The method provides a new approach for processing rich media information by fully considering the internal features of each modality itself as well as the correlations between the modalities. In the dataset constructed in this paper, the accuracy rate is improved by more than 4% compared with the method using single modality. The effectiveness and generality of the method in multimodal sentiment recognition is demonstrated by extending the experiments with a multimodal public dataset.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.