Tuning Hyperparameters of Machine Learning Methods for Afan Oromo Hate Speech Text Detection for Social Media

Naol Bakala Defersha,Karthikeyan Kaliyaperumal,Kula Kekeba

doi:10.1109/iccct53315.2021.9711850

Abstract

With the rapidly growing penetration of social media networks in linguistically diverse and multicultural developing nations like Ethiopia, the conversations of online users have increasingly become more casual and multilingual. The emergency of hate speech text system. To this end, various automated hate speech detection and classification systems have been developed for resource-rich languages such as English and French even though online users are using many other languages on different social media platforms. Afan Oromo is one natural language used by social media users to express feelings, emotions and share messages. Hence, there is an urgent need for the development of an intelligent system that can automatically detect and classify hate speech, especially for resource-scarce indigenous Ethiopian languages like Afan Oromo. This work is about the identification of hate speech text from comments and posts generated in resource scary poor language Afan Oromo. We prepared first hate speech text detection dataset of Afan Oromo that containing comments and posts from social media. Then, n-gram and TF-IDF feature selection approaches were employed to select features. After the important feature selected Natural language processing tasks applied on the dataset. We applied six machine learning classifiers from default and tuned parameters to detect hate speech text posts and comments. The experiment show that Support Vector Machine outperform 92% values of F-measure than classifiers Afan Oromo hate speech text detection dataset. This Afan Oromo hate speech text dataset publicly available on https://www.naolinfo.info/for further research.

Full Text