Abstract

Toxic comment classification models are often found biased towards identity terms, i.e., terms characterizing a specific group of people such as “Muslim” and “black”. Such bias is commonly reflected in false positive predictions, i.e., non-toxic comments with identity terms. In this work, we propose a novel approach to debias the model in toxic comment classification, leveraging the notion of subjectivity level of a comment and the presence of identity terms. We hypothesize that toxic comments containing identity terms are more likely to be expressions of subjective feelings or opinions. Therefore, the subjectivity level of a comment containing identity terms can be helpful for classifying toxic comments and mitigating the identity term bias. To implement this idea, we propose a model based on BERT and study two different methods of measuring the subjectivity level. The first method uses a lexicon-based tool. The second method is based on the idea of calculating the embedding similarity between a comment and a relevant Wikipedia text of the identity term in the comment. We thoroughly evaluate our method on an extensive collection of four datasets collected from different social media platforms. Our results show that: (1) our models that incorporate both features of subjectivity and identity terms consistently outperform strong SOTA baselines, with our best performing model achieving an improvement in F1 of 4.75% over a Twitter dataset; (2) our idea of measuring subjectivity based on the similarity to the relevant Wikipedia text is very effective on toxic comment classification as our model using this has achieved the best performance on 3 out of 4 datasets while obtaining comparative performance on the remaining dataset. We further test our method on RoBERTa to evaluate the generality of our method and the results show the biggest improvement in F1 of up to 1.29% (on a dataset from a white supremacist online forum).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.