Abstract
Offensive language in social media affects the social experience of individuals and groups and hurts social harmony and moral values. Therefore, in recent years, the problem of offensive language detection has attracted the attention of many researchers. However, the primary research currently focuses on detecting English offensive language, while few studies on the Chinese language exist. In this paper, we propose an innovative approach to detect Chinese offensive language. First, unlike previous approaches, we utilized both RoBERTa’s sentence-level and word-level embedding, combining the sentence embedding and word embedding of RoBERTa’s model, bidirectional GRU, and multi-head self-attention mechanism. This feature fusion allows the model to consider sentence-level and word-level semantic information at the same time so as to capture the semantic information of Chinese text more comprehensively. Second, by concatenating the output results of multi-head attention with RoBERTa’s sentence embedding, we achieved an efficient fusion of local and global information and improved the representation ability of the model. The experiments showed that the proposed model achieved 82.931% accuracy and 82.842% F1-score in Chinese offensive language detection tasks, delivering high performance and broad application potential.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.