Amobee at SemEval-2019 Tasks 5 and 6: Multiple Choice CNN Over Contextual Embedding

Alon Rozental,Dadi Biton

doi:10.18653/v1/s19-2066

Abstract

This article describes Amobee’s participation in “HatEval: Multilingual detection of hate speech against immigrants and women in Twitter” (task 5) and “OffensEval: Identifying and Categorizing Offensive Language in Social Media” (task 6). The goal of task 5 was to detect hate speech targeted to women and immigrants. The goal of task 6 was to identify and categorized offensive language in social media, and identify offense target. We present a novel type of convolutional neural network called “Multiple Choice CNN” (MC- CNN) that we used over our newly developed contextual embedding, Rozental et al. (2019). For both tasks we used this architecture and achieved 4th place out of 69 participants with an F1 score of 0.53 in task 5, in task 6 achieved 2nd place (out of 75) in Sub-task B - automatic categorization of offense types (our model reached places 18/2/7 out of 103/75/65 for sub-tasks A, B and C respectively in task 6).

Highlights

Offensive language and hate speech identification are sub-fields of natural language processing that explores the automatic inference of offensive language and hate speech with its target from textual data
This paper describes our system for the OffensEval 2019 and HatEval 2019 tasks, where our new contribution is the use of contextual embedding together with an appropriate network architecture for such embeddings
We chose to use this architecture for both tasks because we believe that the Bidirectional Encoder Representations from Transformers (BERT) model output contains most of the information about the tweet

Summary

Introduction

Offensive language and hate speech identification are sub-fields of natural language processing that explores the automatic inference of offensive language and hate speech with its target from textual data. Its unique insights are relevant for business intelligence, marketing and e-governance This data benefits NLP tasks such as sentiment analysis, offensive language detection, topic extraction, etc. Both the OffensEval 2019 task (Zampieri et al (2019b)) and HatEval 2019 task are part of the SemEval-2019 workshop. The pre-trained BERT representations can be fine-tuned to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications. This paper describes our system for the OffensEval 2019 and HatEval 2019 tasks, where our new contribution is the use of contextual embedding (modified BERT) together with an appropriate network architecture for such embeddings.

Data and Pre-Processing

OffensEval

HatEval

Multiple Choice CNN

Results

Conclusion