A dataset for Sentiment analysis of Entities in News headlines (SEN)

Katarzyna Baraniak,Marcin Sydow

doi:10.1016/j.procs.2021.09.136

Abstract

On-line news portals play a very important role in the information society. Fair media should present reliable and objective information. In practice there is an observable positive or negative bias concerning named entities (e.g. politicians) mentioned in the on-line news headlines.In this paper we present SEN - a novel publicly available human-labelled dataset for training and testing machine learning algorithms for the problem. It consists of 3819 human-labelled political news headlines coming from several major on-line media outlets in English and Polish. We also describe the process of preparing the dataset and present its analysis, including entity and annotator bias analysis, and some insights into possible challenges of the task of entity-level analysis of the news.

Full Text