Sexist content has become increasingly prevalent on social media platforms, underscoring the critical need for the development of efficient Automatic Sexism Detection methods. Previous literature reviews have not encompassed the new advancements in Automatic Sexism Detection observed over the past three years. Hence, the present study conducted a Systematic Literature Review (SLR) that examined 48 primary studies published between 2014 and 17th Sept. 2024, retrieved from six bibliographic databases. This paper aims to present a comprehensive literature review on Automatic Sexism Detection, encompassing the datasets, preprocessing techniques, feature extraction methods, text representations, classification approaches, and evaluation models employed in Automatic Sexism Detection research. The paper includes a discussion of the findings, limitations, and future research directions of the chosen articles. Additionally, it provides an overview of the conclusions drawn from the conducted research. The performed analysis reveals a lack of corpus beyond the English and Spanish language encountered in datasets, with most of the latter being annotated for either misogyny or non-misogyny. Common preprocessing techniques analyzed in the current study include lowercase conversion, text removal, tokenization, stemming, and rewriting. Discrete representations, such as TF-IDF, N-grams, and BoW, are frequently utilized, while distributed representations, like Bert and GloVe, are prominent. Bert is the predominant classification model utilized while combining lexical features can enhance the results in the majority of the discussed scenarios. Accuracy (A) and F1 score (F1) are the most widely deployed evaluation metrics in this field.
Read full abstract