Abstract

As a hot study topic in natural language processing, affec-tive computing and multimedia analysis, multi-modal senti-ment analysis (MSA) is widely explored on aspect-level and sentence-level tasks. However, the existing studies normally rely on a lot of annotated multi-modal data, which are difficult to collect due to the massive expenditure of manpower and re-sources, especially in some open-ended and fine-grained do-mains. Therefore, it is necessary to investigate the few-shot scenario for MSA. In this paper, we propose a prompt-based vision-aware language modeling (PVLM) approach to MSA, which only requires a few supervised data. Specifically, our PVLM can incorporate the visual information into pre-trained language model and leverage prompt tuning to bridge the gap between masked language prediction in pre-training and MSA tasks. Systematic experiments on three aspect-level and two sentence-level datasets of MSA demonstrate the effectiveness of our few-shot approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.