Abstract

This paper presents aspects related to the creation of an automatic classifier designed to evaluate and categorize the level of sexism present in the lyrics of songs of the urban music genre. The classification system assigns lyrics to three different categories: "A", indicating content suitable for audiences of all ages; "B", indicating content requiring adult supervision; and "C", representing adult-oriented material. The classifier was implemented in Python by applying the following algorithms: Naïve Bayes, nearest neighbours, decision tree, support vector machine and logistic regression. For the model training process, a dataset composed of 479 observations was created, divided into 75% for training and 25% for testing. The training dataset included both expressions with sexist connotations and those without. The classifier that achieved the highest degree of accuracy was the model based on the logistic regression algorithm with 77% accuracy. In order to facilitate the exploitation of the classifier in production environments, the model was integrated with a graphical user interface that facilitates the usability of the system for potential beneficiaries.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.