Abstract

Analysis of subjective texts like offensive content or hate speech is a great challenge, especially regarding annotation process. Most of current annotation procedures are aimed at achieving a high level of agreement in order to generate a high quality reference source. However, the annotation guidelines for subjective content may restrict the annotators’ freedom of decision making. Motivated by a moderate annotation agreement in offensive content datasets, we hypothesize that personalized approaches to offensive content identification should be in place. Thus, we propose two novel perspectives of perception: group-based and individual. Using demographics of annotators as well as embeddings of their previous decisions (annotated texts), we are able to train multimodal models (including transformer-based) adjusted to personal or community profiles. Based on the agreement of individuals and groups, we experimentally showed that annotator group agreeability strongly correlates with offensive content recognition quality. The proposed personalized approaches enabled us to create models adaptable to personal user beliefs rather than to agreed offensiveness understanding. Overall, our individualized approaches to offensive content classification outperform classic data-centric methods that generalize offensiveness perception and it refers to all six tested models. Additionally, we developed requirements for annotation procedures, personalization and content processing to make the solutions human-centered.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.