Abstract Detecting and removing hate speech content in a timely manner remains a challenge for social media platforms. Automated techniques such as deep learning models offer solutions which can keep up with the volume and velocity of user content production. Research in this area has mainly focused on either binary classification or on classifying tweets into generalised categories such as hateful, offensive, or neither. Less attention has been given to multiclass classification of online hate speech into the type of hate or group at which it is directed. By aggregating and re-annotating several relevant hate speech datasets, this study presents a dataset and evaluates several models for classifying tweets into the categories ethnicity, gender, religion, sexuality, and non-hate. We evaluate the dataset by training several models: logistic regression, LSTM, BERT, and GPT-2. For the LSTM model, we assess a range of NLP features using a multi-classification LSTM model, and conclude that the highest performing feature combination consists of word $n$ -grams, character $n$ -grams, and dependency tuples. We show that while more recent larger models can achieve a slightly higher performance, increased model complexity alone is not sufficient to achieve significantly improved models. We also compare this approach with a binary classification approach and evaluate the effect of dataset size on model performance.
Read full abstract