Catching the Common Cause: Extraction and Annotation of Causal Relations and their Participants

Ines Rehbein,Josef Ruppenhofer

doi:10.18653/v1/w17-0813

Abstract

In this paper, we present a simple, yet effective method for the automatic identification and extraction of causal relations from text, based on a large English-German parallel corpus. The goal of this effort is to create a lexical resource for German causal relations. The resource will consist of a lexicon that describes constructions that trigger causality as well as the participants of the causal event, and will be augmented by a corpus with annotated instances for each entry, that can be used as training data to develop a system for automatic classification of causal relations. Focusing on verbs, our method harvested a set of 100 different lexical triggers of causality, including support verb constructions. At the moment, our corpus includes over 1,000 annotated instances. The lexicon and the annotated data will be made available to the research community.

Highlights

Causality is an important concept that helps us to make sense of the world around us
We describe our method for automatically identifying new causal triggers from text, based on parallel corpora
Using a strong causal trigger and further constraints for the extraction, such as restricting the candidate set to sentences that have a subject and direct object NP that is linked to the target predicate, we are able to guide the extraction towards instances that, to a large degree, are causal

Summary

Introduction

Causality is an important concept that helps us to make sense of the world around us. This is exemplified by the Causality-by-default hypothesis (Sanders, 2005) that has shown that humans, when presented with two consecutive sentences expressing a relation that is ambiguous between a causal and an additive reading, commonly interpret the relation as causal. Counterfactual Theory tries to explain causality between two events C and E in terms of conditionals such as “If C had not occurred, E would not have occurred”. Probabilistic theories, on the other hand, try to explain causality based on the underlying probability of an event to take place in the world. The theory that has had the greatest impact on linguistic annotation of causality is probably Talmy’s Dynamic Force Model which provides a framework that tries to distinguish weak and strong causal forces, and captures different types of causality such as “letting”, “hindering”, “helping” or “intending”

Methods

Discussion

Conclusion