ABSTRACTOver the past decade, the proliferation of hateful and sexist content targeting women on social media has become a concerning issue, adversely affecting women's lives and freedom of expression. Previous efforts to detect online sexism have utilized monolingual ensemble transformers combined with data augmentation techniques that incorporate related‐domain data, such as hate speech. However, these approaches often struggle to capture the full diversity and complexity of sexism due to limitations in the size and quality of training data. In this study, we introduce a novel sexism detection system that employs in‐domain unlabeled data through unsupervised task‐adaptation techniques and semi‐supervised learning, using an efficient single multilingual transformer model. Additionally, we incorporate a Sentence‐BERT layer to enhance our system with semantically meaningful sentence embeddings. Our proposed system outperforms existing state‐of‐the‐art methods across all tasks and datasets, demonstrating its effectiveness in detecting and addressing sexism in social media text. These results underscore the potential of our approach, providing a foundation for further research and practical applications.
Read full abstract