Multilingual Sentiment Analysis: A Systematic Literature Review

Nur Atiqah Sia Abdullah,Nur Ida Aniza Rusli

doi:10.47836/pjst.29.1.25

Abstract

With the explosive growth of social media, the online community can freely express their opinions without disclosing their identities. People with hidden agendas can easily post fake opinions to discredit target products, services, politicians, or organizations. With these big data, monitoring opinions and distilling their sentiments remain a formidable task because of the proliferation of diverse sites with a large volume of opinions that are portrayed in multilingual. Therefore, this paper aims to provide a systematic literature review on multilingual sentiment analysis, which summarises the common languages supported in multilingual sentiment analysis, pre-processing techniques, existing sentiment analysis approaches, and evaluation models that have been used for multilingual sentiment analysis. By following the systematic literature review, the findings revealed, most of the models supported two languages, and English is seen as the most used language in sentiment analysis studies. None of the reviewed literature has catered the combination of languages for English, Chinese, Malay, and Hindi language on multilingual sentiment analysis. The common pre-processing techniques for the multilingual domain are tokenization, normalization, capitalization, N-gram, and machine translation. Meanwhile, the sentiment analysis classification techniques for multilingual sentiment are hybrid sentiment analysis, which includes localized language analysis, unsupervised topic clustering, and then followed by multilingual sentiment analysis. In terms of evaluation, most of the studies used precision, recall, and accuracy as the benchmark for the results.

Full Text