Abstract
As an emerging research area in natural language processing, multi-modal human language analysis spans language, vision and audio modalities. Understanding multi-modal language requires not only the modeling of independent dynamics within each modality (intra-modal dynamics), but also more importantly interactive dynamics among different modalities (inter-modal dynamics). In this paper, we propose a hierarchical approach to multi-modal language analysis with two levels of attention mechanism, namely interaction-level, which captures the intra-modal and inter-modal dynamics across different modalities with multiple types of attention, and selection-level attention, which selects the effective representations for final prediction by calculating the importance of each vector obtained from interaction-level. Empirical evaluation demonstrates the effectiveness of our proposed approach to multi-modal sentiment classification, sentiment regression and emotion recognition.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.