Abstract

AbstractThe offensive language in present social media could harm the mental health of the minors. The paper aims to identify the offensive content in the posts published on the Twitter social network. More concrete, we categorize the text content of the tweets according to the characteristics they meet: offensive vs. non-offensive tweets, non-targeted tweets vs. tweets targeted on someone, and, more specifically, tweets targeted on an individual, on a group of people, or else targeted towards another category (i.e. towards an organization, a situation, an event, or an issue). This multi-level hierarchical categorization behaves like a top-down decision tree process that classifies the tweets against a tree like ontological taxonomy. This is an offensiveness taxonomy, which defines the above mentioned offensive tweet categories. We use an unsupervised neural network for this hierarchical categorization, as applied on the OLID (Offensive Language Identification Dataset) data set. The OLID dataset consists of tweets, and it was offered as benchmark at Task 6 of the SemEval 2019 competition, a task which actually inspired our paper.KeywordsCategoriztion of offensive contentText categorizationUnsupervised neural networkOffensiveness ontology

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call