Abstract
The proliferation of online toxicity, characterized by offensive and disrespectful language, has been a pervasive issue in Indonesia’s digital environment, impacting users’ mental health and well-being. Simultaneously, the potential of Natural Language Processing (NLP) in detecting and managing toxic comments provides a promising avenue for mitigating online toxicity. This study presents a 3-stages methodology consisting of type, target audience, and topics to detect and categorize online toxicity in the Indonesian language using fine-tuned IndoBERTweet and Indonesian RoBERTa models. The results indicate that the IndoBERTweet model, with optimally adjusted hyperparameters, consistently outperforms the Indonesian RoBERTa model in all stages of our proposed methodology. These outcomes are substantiated by higher precision, recall, and F1 score metrics exhibited by the IndoBERTweet model. This model also exhibits remarkable performance in real-world applicability, accurately classifying new Indonesian language content from Twitter (now X). This research establishes a stepping stone for future work, including exploring other language models, applying the methodology to other languages, training the models on larger and more diverse datasets, and applying it to other social media platforms or forums. Our proposal contributes to create safer online spaces, and the results provide insights for the development of automated moderation tools, playing a significant role in combating online harassment and ensuring online community well-being.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.