Abstract

Affect Sensing is a rapidly growing field with the potential to revolutionize human–computer interaction, healthcare, and many more applications. Multimodal Sentiment Analysis (MSA) is a recent research area that exploits the multimodal nature of video data for affect sensing. However, the success of a multimodal framework depends on addressing the challenges associated with integrating diverse modalities and selecting informative features. We propose a novel multimodal representation learning framework using multimodal autoencoders that learns a comprehensive representation of the underlying heterogeneous modalities. Affect Sensing is even more challenging in low-resource languages because annotated video datasets and language-specific models are limited. To address this concern, we introduce Multimodal Sentiment Analysis Corpus in Tamil (MSAT), a small-sized dataset in the Tamil language for MSA, and exhibit how a novel technique involving cross-lingual transfer learning in a multimodal setting, leverages the knowledge gained by training the model on a larger English MSA dataset to fine-tune a much smaller Tamil MSA dataset. Our transfer learning model achieves significant gain in the Tamil dataset by a large margin. Our experiments demonstrate that we can build efficient, generalized models for low-resource languages by using the existing MSA datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.