Study of Undersampling Method: Instance Hardness Threshold with Various Estimators for Hate Speech Classification

Naufal Azmi Verdikha,Teguh Bharata Adji,Adhistya Erna Permanasari

doi:10.22146/ijitee.42152

Abstract

A text classification system is needed to address the problem of hate speech in social media. However, texts of hate speech are very hard to find in social media. This will make the distribution of training data to be unbalanced (imbalanced data). Classification with imbalanced data will make a poor performance. There are several methods to solve the problem of classification with imbalanced data. One of them is undersampling with Instance Hardness Threshold (IHT) method. IHT method balances the dataset by eliminating data that are frequently misclassified. To find those data, IHT requires an estimator, which is a classifier. This research aims to compare estimators of IHT method to solve imbalanced data problem in hate speech classification using TF-IDF weighting method. This research uses the class ratio of dataset after undersampling, time of the undersampling process, and Index of Balanced Accuracy (IBA) evaluation to determine the best IHT method. The results of this research show that IHT method using the Logistic Regression (IHT(LR)) has the fastest undersampling process (1.91 s), perfectly balance dataset with the class ratio is 1:1, and has the best of IBA evaluation in all estimation process. This result makes IHT(LR) be the best method to solve the imbalanced data problem in hate speech classification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IJITEE (International Journal of Information Technology and Electrical Engineering)	Publication Date: Dec 26, 2018
Citations: 9	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Study of Undersampling Method: Instance Hardness Threshold with Various Estimators for Hate Speech Classification

Abstract

Talk to us

Similar Papers

More From: IJITEE (International Journal of Information Technology and Electrical Engineering)

Lead the way for us

Similar Papers

A Measurement Study of Hate Speech in Social Media
Mainack Mondal ... Leandro Araújo Silva
-
Mainack Mondal, et. al.Mainack Mondal ... Leandro Araújo Silva
04 Jul 2017
04 Jul 2017

Lexicon-Based Indonesian Local Language Abusive Words Dictionary to Detect Hate Speech in Social Media
Mardhiya Hayaty ... Anggit Dwi Hartanto
Journal of Information Systems Engineering and Business Intelligence | VOL. 6
Mardhiya Hayaty, et. al.Mardhiya Hayaty ... Anggit Dwi Hartanto
27 Apr 2020
Journal of Information Systems Engineering and Business Intelligence | VOL. 6

Hate Speech Detection in Code-Mixed Indonesian Social Media: Exploiting Multilingual Languages Resources
Endang Wahyu Pamungkas ... Azizah Fatmawati
-
Endang Wahyu Pamungkas, et. al.Endang Wahyu Pamungkas ... Azizah Fatmawati
08 Dec 2022
08 Dec 2022

The Enforcement of Criminal Laws of Hate Speech in Social Media
Feri Vernando Situngkir ... Siti Rodhiyah Dwi Istinah
Law Development Journal | VOL. 2
Feri Vernando Situngkir, et. al.Feri Vernando Situngkir ... Siti Rodhiyah Dwi Istinah
14 Feb 2021
Law Development Journal | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Study of Undersampling Method: Instance Hardness Threshold with Various Estimators for Hate Speech Classification

Abstract

Talk to us

Similar Papers

More From: IJITEE (International Journal of Information Technology and Electrical Engineering)