Abstract

This paper presents our hierarchical multi-task learning (HMTL) and multi-task learning (MTL) approaches for improving the text encoder in Sub-tasks A, B, and C of Multilingual Offensive Language Identification in Social Media (SemEval-2020 Task 12). We show that using the MTL approach can greatly improve the performance of complex problems, i.e. Sub-tasks B and C. Coupled with a hierarchical approach, the performances are further improved. Overall, our best model, HMTL outperforms the baseline model by 3% and 2% of Macro F-score in Sub-tasks B and C of OffensEval 2020, respectively.

Highlights

  • Multilingual Offensive Language Identification in Social Media (OffensEval 2020) hosted by (Zampieri et al, 2020) is a popular competition attracting many teams

  • To leverage the hierarchical nature of the three subtasks in OffensEval 2020, we further propose the hierarchical multi-task learning (HMTL) model, which is based on the BERT with a hierarchical multi-task learning (MTL) architecture

  • The HMTL model has the best performance in Sub-task B, achieving a macro F-score of 0.7417

Read more

Summary

Introduction

Multilingual Offensive Language Identification in Social Media (OffensEval 2020) hosted by (Zampieri et al, 2020) is a popular competition attracting many teams. The task is based on Offensive Language Identification Dataset (OLID) 1.0 and a new dataset SOLID. In Sub-task A, the goal is to identify if a given tweet is offensive or non-offensive. In Sub-task B, the focus is to identify if the offensive content in a tweet is targeted or untargeted. In Sub-task C, systems have to detect the type of a target in an offensive tweet

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call