Abstract

Social media networks such as Twitter are increasingly utilized to propagate hate speech while facilitating mass communication. Recent studies have highlighted a strong correlation between hate speech propagation and hate crimes such as xenophobic attacks. Due to the size of social media and the consequences of hate speech in society, it is essential to develop automated methods for hate speech detection in different social media platforms. Several studies have investigated the application of different machine learning algorithms for hate speech detection. However, the performance of these algorithms is generally hampered by inefficient sequence transduction. The Vanilla recurrent neural networks and recurrent neural networks with attention have been established as state-of-the-art methods for the assignments of sequence modeling and sequence transduction. Unfortunately, these methods suffer from intrinsic problems such as long-term dependency and lack of parallelization. In this study, we investigate a transformer-based method and tested it on a publicly available multiclass hate speech corpus containing 24783 labeled tweets. DistilBERT transformer method was compared against attention-based recurrent neural networks and other transformer baselines for hate speech detection in Twitter documents. The study results show that DistilBERT transformer outperformed the baseline algorithms while allowing parallelization.

Highlights

  • Social media platforms such as Twitter are publicly accessible digital resources for online communication and collaboration

  • We propose DistilBERT a streamlined version of Bidirectional encoder representations from text (BERT) that uses only half the number of parameters of BERT [27] but retains the performance of BERT in many text processing tasks [33] while making the inference 60% faster than BERT [34]

  • Results of the proposed DistilBERT method was compared against results computed by BERT, XLNet, RoBERTa and attention-based long short-term memory (LSTM)

Read more

Summary

Introduction

Social media platforms such as Twitter are publicly accessible digital resources for online communication and collaboration. Social media companies such as Twitter and Facebook employ human annotators to manually delete messages deemed to be hateful [3]. Users of these platforms are encouraged to flag and report contents they perceive to be inimical to the public. Machine learning algorithms can be classified into two broad categories, which are classical machine learning and deep learning. Both methods have been exploited and tested for hate speech detection in earlier studies

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.