Abstract

Affect Sensing is a rapidly growing field with the potential to revolutionize human–computer interaction, healthcare, and many more applications. Multimodal Sentiment Analysis (MSA) is a recent research area that exploits the multimodal nature of video data for affect sensing. However, the success of a multimodal framework depends on addressing the challenges associated with integrating diverse modalities and selecting informative features. We propose a novel multimodal representation learning framework using multimodal autoencoders that learns a comprehensive representation of the underlying heterogeneous modalities. Affect Sensing is even more challenging in low-resource languages because annotated video datasets and language-specific models are limited. To address this concern, we introduce Multimodal Sentiment Analysis Corpus in Tamil (MSAT), a small-sized dataset in the Tamil language for MSA, and exhibit how a novel technique involving cross-lingual transfer learning in a multimodal setting, leverages the knowledge gained by training the model on a larger English MSA dataset to fine-tune a much smaller Tamil MSA dataset. Our transfer learning model achieves significant gain in the Tamil dataset by a large margin. Our experiments demonstrate that we can build efficient, generalized models for low-resource languages by using the existing MSA datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call