악성 댓글에 대한 한국어 혐오표현 및 편견 탐지 분류 모형 결과 분석 및 개선방안 연구

Seyoung Lee,Saerom Park

doi:10.7232/jkiie.2022.48.6.636

Abstract

With the development of Internet communication technology, opinions on various issues can be freely expressed on the Internet. However, some people have abused their freedom of expression, causing psychological harm by writing comments expressing their hatred towards others. In order to address this problem, research on automatic detection of malicious comments using machine learning models has been actively conducted. In this study, we constructed the detection models for hate speech and bias to classify KOCO (KOrean hate COmments) dataset using popular language classification models such as logistic regression with term frequency-inverse document frequency, KoBERT, KoELECTRA, KcELECTRA and KoGPT2 models. Through the experiments, we demonstrated that sentence length, reflection of context information, and mis-labeled data highly affected the classification performance of most models. As a result, we presented considerations for automatic detection of malicious comments and directions for constructing the comment dataset to improve the detection models in future research.

Full Text