Abstract

This article describes the system submitted to SemEval-2020 Task 12 OffensEval 2: Multilingual Offensive Language Recognition in Social Media. The task is to classify offensive language in social media. The shared task contains five languages (English, Greek, Arabic, Danish, and Turkish) and three subtasks. We only participated in subtask A of English to identify offensive language. To solve this task, we proposed a system based on a Bidirectional Gated Recurrent Unit (Bi-GRU) with a Capsule model. Finally, we used the K-fold approach for ensemble. Our model achieved a Macro-average F1 score of 0.90969 (ranked 27/85) in subtask A.

Highlights

  • Offensive language is ubiquitous in social media, and individuals often uses the anonymity of computer communications for some anti-social network behaviors, including cyberbullying (Xu et al, 2012), malicious provocation (Kwok and Wang, 2013), and offensive language (Cheng et al, 2017)

  • SemEval-2020 OffensEval 2 is proposed for multilingual offensive language recognition in social media (Zampieri et al, 2020)

  • Our model used bidirectional gate recursive unit (GRU) (Bi-GRU) (Bahdanau et al, 2014) to process the sequence from two directions, utilizing both the previous and future context, and capsule is a group of neurons that use vectors to represent parameters, capsule network uses the inner product method to cluster the input features

Read more

Summary

Introduction

Offensive language is ubiquitous in social media, and individuals often uses the anonymity of computer communications for some anti-social network behaviors, including cyberbullying (Xu et al, 2012), malicious provocation (Kwok and Wang, 2013), and offensive language (Cheng et al, 2017). The widespread dissemination of offensive content in social media is a cause of concern for governments and many technology companies around the world. One of the most common and effective strategies for solving offensive language problems on the network is to train systems that can recognize such content. SemEval-2020 OffensEval 2 is proposed for multilingual offensive language recognition in social media (Zampieri et al, 2020). Participating systems need to divide Tweet into two categories: Offensive (OFF) and Not Offensive (NOT). In this competition, we only participated in subtask A of the English language.

Related Work
Data description
Data preprocessing
Bi-GRU with a Capsule model
K-folding ensemble
Ablation experiment
Experiment setting
Result
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.