Dataset Augmentation for Counteracting Bias in Toxic Comment Classification

Senhao Cheng

doi:10.54097/ex94ex07

Abstract

 Toxic comments are a prevalent issue on online social media and networking platforms. These comments contain offensive, malicious, hate speech, or other harmful content that negatively impacts audiences and communities. Effectively detecting and categorizing toxic comments is essential for maintaining order in the online environ ment, protecting user safety, and enhancing user experience. This is despite the fact that researchers and companies have developed various models to recognize toxicity in online chats and comments, achieving some success. However, many of the currently used models incorrectly classify non-toxic comments that contain certain identity terms as potentially toxic. This misclassification hinders the ability to accurately identify categorized comments. In this paper, the detection and classification of toxic comments were implemented using Term Frequency-Inverse Document Frequency (TF-IDF) and machine learning techniques. Additionally, two dataset-specific optimizations were proposed to mitigate the impact of bias on text classification by expanding the number of datasets. Comparative analysis of bias evaluation metrics demonstrates that this approach can effectively mitigate bias while maintaining the accuracy of the original model as much as possible.

Full Text