BERT base model for toxic comment analysis on Indonesian social media

Ghinaa Zain Nabiilah,Simeon Yuda Prasetyo,Zahra Nabila Izdihar,Abba Suganda Girsang

doi:10.1016/j.procs.2022.12.188

Ghinaa Zain Nabiilah, Simeon Yuda Prasetyo + Show 2 more

Open Access

https://doi.org/10.1016/j.procs.2022.12.188

Copy DOI

Journal: Procedia computer science	Publication Date: Jan 1, 2023
Citations: 8	License type: cc-by-nc-nd

Affiliation: Binus University

Abstract

Social media is an online media that functions as a platform for users to participate, share, create, and exchange information through various forums and social networks. The rapid increase in social media activity causes an increase in the number of comments on social media. This is prone to triggering debate due to the easy formation of open discussions between social media users. However, the debate often triggers the emergence of negative things, causing great fights on social media. Social media users often use comments containing toxic words to argue and corner a party or group. This study conducted an experiment to detect comments containing toxic sentences on social media in Indonesia using a Pre-Trained Model that was trained for Indonesian. This study performed a multilabel classification and evaluated the classification results generated by the Multilingual BERT (MBERT), IndoBERT, and Indo Roberta Small models. The optimal result of this study is to use the IndoBERT model with an F1 Score of 0.8897.

Full Text