Abstract

Online communities allow users to establish a web presence, manage their identities, and stay connected with others. The internet has facilitated global outreach with just a click on the World Wide Web. However, the current landscape of online social media platforms are marred by various issues, with hate speech prominently taking center stage. Hate speech is characterized by hostile and malicious language driven by prejudice, targeting individuals or groups based on their innate, natural, or perceived characteristics. Detecting such speech is crucial for maintaining a safe online environment. This study examines the impact of dataset regularization techniques on the performance of BERTimbau-based models when applied to four Portuguese hate speech datasets: Fortuna et al. (2019), OFFCOMBR-2, ToLD-BR, and Hate-BR. Four Data Augmentation techniques are evaluated: Oversampling, Undersampling, Text Augmentation, and Synonym Replacement. Our experiments revealed that, apart from the Fortuna et al. (2019) dataset, the Data Augmentation techniques did not significantly enhance the performance of hate speech detection tasks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call