Hate Speech Detection on Twitter through Natural Language Processing using LSTM Model

Hani Nurrahmi,Cepthari Ningtyas Arbaatun,Dade Nurjanah

doi:10.47065/bits.v4i3.2718

Hani Nurrahmi, Cepthari Ningtyas Arbaatun + Show 1 more

Open Access

https://doi.org/10.47065/bits.v4i3.2718

Copy DOI

Abstract

Currently, social media is a place to express opinions. This opinion can be positive or negative. However, lately, the opinion that often appears is a negative opinion, such as hate speech. Hate speech is often found on social media, such as malicious comments intended to insult individuals or groups. Based on WeAreSocial data in 2021, one of the most used social media platforms in Indonesia is Twitter, with 63.6% of users. According to the Indonesia National Police, hate speech cases were more dominant during the period from April 2020 to July 2021. Therefore, efforts are needed to identify hate speech on the Twitter platform. One way to detect hate speech is by using deep learning. In this research, we use a deep learning model of Long Short-Term Memory (LSTM) with word embedding. FastText and Global Vector (GloVe) is the word embeddings that we use as input for word representation and classification. FastText embeddings make use of subword information to create word embeddings and GloVe embeddings using an unsupervised learning method trained on a corpus to generate distributional feature vectors. From the evaluation results on the experimental model, LSTM-FastText using random oversampling has an advantage with an F1-score of 89.91% compared to LSTM-GloVe to obtain an F1-score of 82.14%.

Full Text