SENTiVENT: enabling supervised information extraction of company-specific events in economic and financial news

Gilles Jacobs,Véronique Hoste

doi:10.1007/s10579-021-09562-4

Gilles Jacobs, Véronique Hoste

Open Access

PDF Available

https://doi.org/10.1007/s10579-021-09562-4

Copy DOI

Export

Save

Cite

Journal: Language Resources and Evaluation	Publication Date: Oct 8, 2021
Citations: 23	License type: open-access

Affiliation: Ghent University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

We present SENTiVENT, a corpus of fine-grained company-specific events in English economic news articles. The domain of event processing is highly productive and various general domain, fine-grained event extraction corpora are freely available but economically-focused resources are lacking. This work fills a large need for a manually annotated dataset for economic and financial text mining applications. A representative corpus of business news is crawled and an annotation scheme developed with an iteratively refined economic event typology. The annotations are compatible with benchmark datasets (ACE/ERE) so state-of-the-art event extraction systems can be readily applied. This results in a gold-standard dataset annotated with event triggers, participant arguments, event co-reference, and event attributes such as type, subtype, negation, and modality. An adjudicated reference test set is created for use in annotator and system evaluation. Agreement scores are substantial and annotator performance adequate, indicating that the annotation scheme produces consistent event annotations of high quality. In an event detection pilot study, satisfactory results were obtained with a macro-averaged F_1-score of 59% validating the dataset for machine learning purposes. This dataset thus provides a rich resource on events as training data for supervised machine learning for economic and financial applications. The dataset and related source code is made available at https://osf.io/8jec2/.

Highlights

In the economic domain, information extraction from text is highly popular for making available fundamental knowledge present in economic text, such as business news (Day & Lee, 2016; Khedr et al, 2017), regulatory disclosures (Cavar & Josefy, 2018; Feuerriegel & Gordon, 2018), and social media (Gemar & JimenezQuintero, 2015; Oliveira et al, 2017)
This results in a gold-standard dataset annotated with event triggers, participant arguments, event co-reference, and event attributes such as type, subtype, negation, and modality
In order to allow for data-driven supervised event extraction for company specific news, we discuss in this work the construction of the SENTiVENT Economic Event corpus, a representative corpus enabling both NLP and market research

Summary

Introduction

Information extraction from text is highly popular for making available fundamental knowledge present in economic text, such as business news (Day & Lee, 2016; Khedr et al, 2017), regulatory disclosures (Cavar & Josefy, 2018; Feuerriegel & Gordon, 2018), and social media (Gemar & JimenezQuintero, 2015; Oliveira et al, 2017). Extracting factual data (using named entity recognition, relation extraction, and event extraction) or subjectivity data (using sentiment analysis) from economic text has the ability to enhance the available numerical data on markets with fundamental information for financial applications. Events encapsulate new information on the market and automatically collecting novel event data has applications in stock prediction (Bholat et al, 2015; Chen et al, 2019; Nardo et al, 2016; Nassirtoussi et al, 2014; Zhang et al, 2018), risk analysis (Hogenboom et al, 2015; Wei et al, 2019), policy assessment (Karami et al, 2018; Tobback et al, 2018), brand management (De Clercq et al, 2017; Geetha et al, 2017) and marketing (Rambocas & Pacheco, 2018) (Fig. 1). In order to allow for data-driven supervised event extraction for company specific news, we discuss in this work the construction of the SENTiVENT Economic Event corpus, a representative corpus enabling both NLP and market research. The SENTiVENT annotation scheme aims to be compatible with the benchmark ACE/ERE event datasets

Methods

Results

Discussion

Conclusion