Abstract
This paper presents our hierarchical multi-task learning (HMTL) and multi-task learning (MTL) approaches for improving the text encoder in Sub-tasks A, B, and C of Multilingual Offensive Language Identification in Social Media (SemEval-2020 Task 12). We show that using the MTL approach can greatly improve the performance of complex problems, i.e. Sub-tasks B and C. Coupled with a hierarchical approach, the performances are further improved. Overall, our best model, HMTL outperforms the baseline model by 3% and 2% of Macro F-score in Sub-tasks B and C of OffensEval 2020, respectively.
Highlights
Multilingual Offensive Language Identification in Social Media (OffensEval 2020) hosted by (Zampieri et al, 2020) is a popular competition attracting many teams
To leverage the hierarchical nature of the three subtasks in OffensEval 2020, we further propose the hierarchical multi-task learning (HMTL) model, which is based on the BERT with a hierarchical multi-task learning (MTL) architecture
The HMTL model has the best performance in Sub-task B, achieving a macro F-score of 0.7417
Summary
Multilingual Offensive Language Identification in Social Media (OffensEval 2020) hosted by (Zampieri et al, 2020) is a popular competition attracting many teams. The task is based on Offensive Language Identification Dataset (OLID) 1.0 and a new dataset SOLID. In Sub-task A, the goal is to identify if a given tweet is offensive or non-offensive. In Sub-task B, the focus is to identify if the offensive content in a tweet is targeted or untargeted. In Sub-task C, systems have to detect the type of a target in an offensive tweet
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have