Optimizing Deep Learning for Detection Cyberbullying Text in Indonesian Language

Laksmi Anindyati,Ade Nursanti,Ayu Purwarianti

doi:10.1109/icaicta.2019.8904108

Abstract

Cyberbullying in Indonesia currently become a concern due to the increasing usage of social media. Cyberbullying detection is an important step to make good environments in social media interactions. This research is part of computational linguistics that focuses on the usage of deep learning to detect bullying sentence on Twitter. There are two important processes in this study. First, the process of forming a word representation. Second, the classification process for detecting bullying sentences. Pre-trained process to build the new representation of term/word is performed independently. Word2vec is used as a tool for the pre-trained process. There are two types of data used in the pre-training process. The first type of data only used testing data and training data, while the second type of data is the overall data, total 26,800 unique Twitter sentences including test data and training data. The classification process is formed using three main algorithms that are popular for text classification: LSTM, bi-LSTM, and CNN. 9.854 labeled sentences are extracted from 2.584 Twitter conversations used as the dataset. The dataset consists of 1.680 sentences are labeled as a bully and 6.343 sentences are labeled as neutral. A total of 504 experiments are conducted in this research by exploiting the preprocessing stage for determining machine learning features, dropout layers configuration and the algorithm of deep learning. The experiments show that the accuracy score reaches 90.57% while the recall score for bully class reaches 75.7%.

Full Text