Fine-grained Multi-label Sexism Classification Using Semi-supervised Learning

Harika Abburi,Pulkit Parikh,Niyati Chhaya,Vasudeva Varma

doi:10.1007/978-3-030-62008-0_37

Abstract

Sexism, a pervasive form of oppression, causes profound suffering through various manifestations. Given the rising number of experiences of sexism reported online, categorizing these recollections automatically can aid the fight against sexism, as it can facilitate effective analyses by gender studies researchers and government officials involved in policy making. In this paper, we explore the fine-grained, multi-label classification of accounts (reports) of sexism. To the best of our knowledge, we consider substantially more categories of sexism than any related prior work through our 23-class problem formulation. Moreover, we present the first semi-supervised work for the multi-label classification of accounts describing any type(s) of sexism wherein the approach goes beyond merely fine-tuning pre-trained models using unlabeled data. We devise self-training based techniques tailor-made for the multi-label nature of the problem to utilize unlabeled samples for augmenting the labeled set. We identify high textual diversity with respect to the existing labeled set as a desirable quality for candidate unlabeled instances and develop methods for incorporating it into our approach. We also explore ways of infusing class imbalance alleviation for multi-label classification into our semi-supervised learning, independently and in conjunction with the method involving diversity. Several proposed methods outperform a variety of baselines on a recently released dataset for multi-label sexism categorization across several standard metrics.

Full Text