Abstract

Hate speech and abusive words spread widely on social media. The impact of hate speech on social media is hazardous, which can lead to discrimination, social conflict, and even genocide. Hate speech also has target types, categories, and levels. This research discusses the classification of hate speech and abusive words in the text on social media Twitter in Indonesian, English, and a mixture of both up to the types, categories, and levels. Classification of hate speech multilabel text is investigated using RFDT, BiLSTM, and BiLSTM with the pre-trained BERT model. The Classifier Chains, Label Powerset, and Binary Relevance methods are also used as data transformation, and TF-IDF is also used as feature extraction combined with the RFDT classification method. Some scenarios of the preprocessing stage are also carried out to find the best results, namely full preprocess, without stopword removal, and without stemming and without stopword removal. The problem of having Indonesian, English, and a mixture of both is solved in two ways, namely, without being translated and translated into Indonesian. The best results with an accuracy of 76.12% were obtained using the RFDT classification method with Classifier Chains, without translation, without stemming, and without stopword removal. This research also shows that the translation, stemming, and stopword removal are not effective, and the problem of dependencies between labels greatly affects the results of classification.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.