Utilizing Translation to Enhance NLP Models in Offensive Language and Hate Speech Identification

Sandy Kurniawan,Indra Budi

doi:10.62885/improsci.v1i4.187

Abstract

The number of social media users in Indonesia has increased in recent years. The surge in social media users leads to more offensive language on these platforms. The use of offensive language can trigger conflicts between users. Therefore, it is necessary to identify the use of offensive language on social media. This study focused on identifying offensive language, hate speech, and hate speech targets on Twitter. The data used were obtained from previous research on identifying offensive language and hate speech. The amount of data is very influential on the performance of the classification. Therefore, data was added using translation in this study. Classical machine learning (SVM et al.) and deep learning (BiLSTM, CNN, and LSTM) algorithms are used as classification algorithms with word n-gram and word embedding as the features. Three scenarios were done based on the training data used in the classification model development. The result shows that scenario 3, which uses translation for data augmentation, can improve the classification model’s performance by 5%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Utilizing Translation to Enhance NLP Models in Offensive Language and Hate Speech Identification

Abstract

Talk to us

Similar Papers

More From: Jurnal Improsci

Lead the way for us

Journal: Jurnal Improsci	Publication Date: Feb 16, 2024
License type: CC BY 4.0

Similar Papers

Identification of hate speech and abusive language on indonesian Twitter using the Word2vec, part of speech and emoji features
Muhammad Okky Ibrohim ... Muhammad Akbar Setiadi
-
Muhammad Okky Ibrohim, et. al.Muhammad Okky Ibrohim ... Muhammad Akbar Setiadi
15 Nov 2019
15 Nov 2019

Hate Speech and Abusive Language and Abusive Language Detection in Twitter using Machine Learning
Sakshi Dhatrak ... Sakshi Bodke
International Journal of Advanced Research in Science, Communication and Technology | VOL. -
Sakshi Dhatrak, et. al. Sakshi Dhatrak ... Sakshi Bodke
09 Mar 2024
International Journal of Advanced Research in Science, Communication and Technology | VOL. -

Separating Hate Speech from Abusive Language on Indonesian Twitter
Muhammad Amien Ibrahim ... Puguh Wahyu Prasetyo
-
Muhammad Amien Ibrahim, et. al.Muhammad Amien Ibrahim ... Puguh Wahyu Prasetyo
06 Jul 2022
06 Jul 2022

An Approach of Hate Speech Identification on Twitter Corpus
Kavita Kumari ... Anupam Jamatia
-
Kavita Kumari, et. al.Kavita Kumari ... Anupam Jamatia
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Utilizing Translation to Enhance NLP Models in Offensive Language and Hate Speech Identification

Abstract

Talk to us

Similar Papers

More From: Jurnal Improsci