Twitter is habitually exploited now-a-days to propagate torrents of hate speeches, misogynistic, and misandry tweets that are written in slang. Machine learning methods have been explored in manifold studies to address the inherent challenges of hate speech detection in online spaces. Nevertheless, language has subtleties that can make it stiff for machines to adequately comprehend and disambiguate the semantics of words that are heavily dependent on the usage context. Deep learning methods have demonstrated promising results for automatic hate speech detection, but they require a significant volume of training data. Classical machine learning methods suffer from the innate problem of high variance that in turn affects the performance of hate speech detection systems. This study presents a voting ensemble machine learning method that harnesses the strengths of logistic regression, decision trees, and support vector machines for the automatic detection of hate speech in tweets. The method was evaluated against ten widely used machine learning methods on two standard tweet data sets using the famous performance evaluation metrics to achieve an improved average F1-score of 94.2%.
Read full abstract