NTU_NLP at SemEval-2020 Task 12: Identifying Offensive Tweets Using Hierarchical Multi-Task Learning Approach

Po-Chun Chen,Hsin-Hsi Chen,Hen-Hsen Huang

doi:10.18653/v1/2020.semeval-1.279

Abstract

This paper presents our hierarchical multi-task learning (HMTL) and multi-task learning (MTL) approaches for improving the text encoder in Sub-tasks A, B, and C of Multilingual Offensive Language Identification in Social Media (SemEval-2020 Task 12). We show that using the MTL approach can greatly improve the performance of complex problems, i.e. Sub-tasks B and C. Coupled with a hierarchical approach, the performances are further improved. Overall, our best model, HMTL outperforms the baseline model by 3% and 2% of Macro F-score in Sub-tasks B and C of OffensEval 2020, respectively.

Highlights

Multilingual Offensive Language Identification in Social Media (OffensEval 2020) hosted by (Zampieri et al, 2020) is a popular competition attracting many teams
To leverage the hierarchical nature of the three subtasks in OffensEval 2020, we further propose the hierarchical multi-task learning (HMTL) model, which is based on the BERT with a hierarchical multi-task learning (MTL) architecture
The HMTL model has the best performance in Sub-task B, achieving a macro F-score of 0.7417

Summary

Introduction

Multilingual Offensive Language Identification in Social Media (OffensEval 2020) hosted by (Zampieri et al, 2020) is a popular competition attracting many teams. The task is based on Offensive Language Identification Dataset (OLID) 1.0 and a new dataset SOLID. In Sub-task A, the goal is to identify if a given tweet is offensive or non-offensive. In Sub-task B, the focus is to identify if the offensive content in a tweet is targeted or untargeted. In Sub-task C, systems have to detect the type of a target in an offensive tweet

Objectives

Results

Conclusion