Abstract

As an essential component of human cognition, cause–effect relations appear frequently in text, and curating cause–effect relations from text helps in building causal networks for predictive tasks. Existing causality extraction techniques include knowledge-based, statistical machine learning (ML)-based, and deep learning-based approaches. Each method has its advantages and weaknesses. For example, knowledge-based methods are understandable but require extensive manual domain knowledge and have poor cross-domain applicability. Statistical machine learning methods are more automated because of natural language processing (NLP) toolkits. However, feature engineering is labor-intensive, and toolkits may lead to error propagation. In the past few years, deep learning techniques attract substantial attention from NLP researchers because of its powerful representation learning ability and the rapid increase in computational resources. Their limitations include high computational costs and a lack of adequate annotated training data. In this paper, we conduct a comprehensive survey of causality extraction. We initially introduce primary forms existing in the causality extraction: explicit intra-sentential causality, implicit causality, and inter-sentential causality. Next, we list benchmark datasets and modeling assessment methods for causal relation extraction. Then, we present a structured overview of the three techniques with their representative systems. Lastly, we highlight existing open challenges with their potential directions.

Highlights

  • With the rapid growth of unstructured texts online, information extraction (IE) plays a vital role in natural language processing (NLP) research

  • Based on the assumption that dependency paths between cause and effect can be viewed as background knowledge, they use a wide range of such paths, regardless of whether cause and effect appear within one sentence or in adjacent sentences, taking web texts as extra input

  • Causal relations in natural language text play a key role in clinical decision-making, biomedical knowledge discovery, emergency management, news topic references, etc

Read more

Summary

Introduction

With the rapid growth of unstructured texts online, information extraction (IE) plays a vital role in NLP research. RE refers to extracted and classified semantic relationships, such as whole–part, product–producer, and cause–effect from text. The critical issues of whether a disease is the reason for a symptom depend on if there are cause–effect relation between them Extracting such kinds of causal relations from the medical literature can support constructing a knowledge graph, which can assist doctors in quickly finding causality, like diseases-cause-symptoms, diseases-bring-complications, treatments-improveconditions, and customize treatment plans. The task of CE focuses on developing systems for identifying cause–effect relations between pairs of labeled nouns from text [5]. CE studies can be classified in terms of different representation patterns: explicit or implicit causality, intra- or inter-sentential causality. Causality in many texts is implicit and/or inter-sentential conditions, which are more complicated than basic kinds of causality.

Previous surveys
Benchmark datasets
Balanced Related works
Evaluation metrics
Knowledge-based approaches
Explicit intra-sentential causality
Implicit causality
Inter-sentential causality
Statistical machine learning-based approaches
Explicit Intra-sentential causality
Deep learning-based approaches
Systems summary
Open problems and future directions
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.