Translated vs Non-Translated Method for Multilingual Hate Speech Identification in Twitter

Muhammad Okky Ibrohim,Indra Budi

doi:10.18517/ijaseit.9.4.8123

Abstract

Nowadays social media is often misused to spread hate speech. Spreading hate speech is an act that needs to be handled in a special way because it can undermine or discriminate other people and cause conflict that leading to both material and immaterial losses. There are several challenges in building a hate speech identification system; one of them is identifying hate speech in multilingual scope. In this paper, we adapt and compare two methods in multilingual text classification which are translated (with and without language identification) and non-translated method for multilingual hate speech identification (including Hindi, English, and Indonesian language) using machine learning approach. We use some classification algorithms (classifiers) namely Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) with word n-grams and char n-grams (character n-grams) as feature extraction. Our experiment result shows that the non-translated method gives the best result. However, the use of non-translated method needs to be reconsidered because this method needs more cost for data collection and annotation. Meanwhile, translated without language identification method give a poor result. To address this problem, we combine translated method with monolingual hate speech identification, and the experiment result shows that this approach can increase the multilingual hate speech identification performance compared to translate without language identification. This paper discusses the advantages and disadvantages for all method and the future works to enhance the performance in multilingual hate speech identification.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Translated vs Non-Translated Method for Multilingual Hate Speech Identification in Twitter

Abstract

Talk to us

Similar Papers

More From: International Journal on Advanced Science, Engineering and Information Technology

Lead the way for us

Journal: International Journal on Advanced Science, Engineering and Information Technology	Publication Date: Aug 2, 2019
Citations: 13

Similar Papers

Identification of hate speech and abusive language on indonesian Twitter using the Word2vec, part of speech and emoji features
Muhammad Okky Ibrohim ... Indra Budi
-
Muhammad Okky Ibrohim, et. al.Muhammad Okky Ibrohim ... Indra Budi
15 Nov 2019
15 Nov 2019

Hate speech detection in the Indonesian language: A dataset and preliminary study
Ika Alfina ... Mohamad Ivan Fanany
-
Ika Alfina, et. al.Ika Alfina ... Mohamad Ivan Fanany
01 Oct 2017
01 Oct 2017

Abusive Language and Hate Speech Detection for Indonesian-Local Language in Social Media Text
Shofianina Dwi Ananda Putri ... Muhammad Okky Ibrohim
-
Shofianina Dwi Ananda Putri, et. al.Shofianina Dwi Ananda Putri ... Muhammad Okky Ibrohim
01 Jan 2020
01 Jan 2020

Hierarchical Multi-label Classification to Identify Hate Speech and Abusive Language on Indonesian Twitter
Faizal Adhitama Prabowo ... Muhammad Okky Ibrohim
-
Faizal Adhitama Prabowo, et. al.Faizal Adhitama Prabowo ... Muhammad Okky Ibrohim
01 Sep 2019
01 Sep 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Translated vs Non-Translated Method for Multilingual Hate Speech Identification in Twitter

Abstract

Talk to us

Similar Papers

More From: International Journal on Advanced Science, Engineering and Information Technology