Detection and Classification of Toxic Comments by Using LSTM and Bi-LSTM Approach

Simrann Arora,Rachna Jain,Akash Gupta,Anand Nayyar

doi:10.1007/978-981-16-3660-8_10

Abstract

With the advancement in the technology, a lot of comments has been produced on a regular basis through the various online communication platforms like Wikipedia, twitter, Glassdoor etc. Although, many of these comments really benefit the people, but the various high toxic comments are also responsible for the increasing online harassment, mental depression and even personal attacks. Toxic Comment Classification is one of the active research topics at present. In the following study, a multi-label classification model is presented to classify the various toxic comments into six classes namely toxic, severe toxic, obscene, threat, insult and identity hate. The proposed classification model has been built using deep learning algorithms explicitly Long Short-Term Memory (LSTM) and Bi-Directional Long Short-Term Memory (Bi-LSTM) along with the word embeddings by adapting insights from previous proposed works. The dataset for this research is obtained from the Kaggle and is provided by the Conversation AI team (a research ingenuity co-founded by Google as well as Jigsaw). The accuracy score of both the proposed techniques is evaluated and compared. Finally, the empirical results show that Bi-LSTM algorithm achieved better in comparison to LSTM with an increased accuracy of 98.07%.KeywordsToxic comments classificationLSTMBi-LSTMWord embeddingsMulti-label classificationOnline harassmentPersonal attack

Full Text