Lexicon‐based fine‐tuning of multilingual language models for low‐resource language sentiment analysis

Vinura Dhananjaya,Surangika Ranathunga,Sanath Jayasena

doi:10.1049/cit2.12333

Lexicon‐based fine‐tuning of multilingual language models for low‐resource language sentiment analysis

Vinura Dhananjaya, Surangika Ranathunga + Show 1 more

Open Access

https://doi.org/10.1049/cit2.12333

Copy DOI

Journal: CAAI Transactions on Intelligence Technology	Publication Date: Apr 1, 2024
License type: CC BY 4.0

Affiliation: University of Moratuwa, Massey University

#Models For Sentiment Classification #Low‐resource Languages + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

AbstractPre‐trained multilingual language models (PMLMs) such as mBERT and XLM‐R have shown good cross‐lingual transferability. However, they are not specifically trained to capture cross‐lingual signals concerning sentiment words. This poses a disadvantage for low‐resource languages (LRLs) that are under‐represented in these models. To better fine‐tune these models for sentiment classification in LRLs, a novel intermediate task fine‐tuning (ITFT) technique based on a sentiment lexicon of a high‐resource language (HRL) is introduced. The authors experiment with LRLs Sinhala, Tamil and Bengali for a 3‐class sentiment classification task and show that this method outperforms vanilla fine‐tuning of the PMLM. It also outperforms or is on‐par with basic ITFT that relies on an HRL sentiment classification dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: CAAI Transactions on Intelligence Technology

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.