Abstract

Misogyny is a severe social problem that affects women’s mental and physical health or even leads to femicide. This cultural problem is visible and prevalent in different communication channels, such as music and social media, confirming or inciting this behavior. Hence, the automatic detection of misogynistic content in social media using computational methods that analyze the language is of increasing interest. Most approaches follow a supervised machine learning strategy, with the main challenge of capturing the diversity and complexity of the offensive language directed at women. Therefore, the size and quality of the training data play essential roles. In this context, we designed a novel data augmentation approach that leverages song phrases to increase the models’ ability to generalize and improve their performance. In addition, this paper introduces a methodology to compile a labeled dataset with song segments conveying misogyny, which can be used to enrich different techniques in this field. The proposed approach was evaluated using English and Spanish benchmark datasets. It successfully overcomes conventional transfer learning techniques and achieves high competitiveness compared with state-of-the-art methods, outperforming them on the Spanish dataset. These encouraging results demonstrate the usefulness of the proposed approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call