Abstract

The rapid rise of platforms like YouTube and Facebook is due to the spread of tablets, smartphones, and other electronic devices. Massive volumes of data are collected every second on such a platform, demanding large-scale data processing. Because these data come in a variety of modalities, including text, audio, and video, sentiment categorization in various modalities and emotional computing are the most researched fields in today's scenario. Companies are striving to make use of this information by developing automated systems for a variety of purposes, such as automated customer feedback collection from user assessments, where the underlying challenge is to mine user sentiment connected to a specific product or service. The use of efficient and effective sentiment analysis tools is required to solve such a complex problem with such a big volume of data. The sentiment analysis of videos is investigated in this study, with data available in three modalities: audio, video, and text. In today's world, modality fusion is a major problem. This study introduces a novel approach to speaker-independent fusion: utilizing deep learning to fuse in a hierarchical fashion. The work tried to obtain improvement over simple concatenation-based fusion.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call