Constructing efficient information extraction pipelines

Henning Wachsmuth,Gregor Engels,Benno Stein

doi:10.1145/2063576.2063935

Abstract

Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much potential depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Constructing efficient information extraction pipelines

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Building a generic debugger for information extraction pipelines
Anish Das Sarma ... Alpa Jain
-
Anish Das Sarma, et. al.Anish Das Sarma ... Alpa Jain
24 Oct 2011
24 Oct 2011

Joint Inference over a Lightly Supervised Information Extraction Pipeline: Towards Event Coreference Resolution for Resource-Scarce Languages
Chen Chen ... Vincent Ng
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 30
Chen Chen, et. al.Chen Chen ... Vincent Ng
05 Mar 2016
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 30

Extracting Reproductive Condition and Habitat Information from Text Using a Transformer-based Information Extraction Pipeline
Roselyn Gabud ... Riza Batista-Navarro
Biodiversity Information Science and Standards | VOL. 7
Roselyn Gabud, et. al.Roselyn Gabud ... Riza Batista-Navarro
11 Sep 2023
Biodiversity Information Science and Standards | VOL. 7

Programming techniques for improving rule readability for rule-based information extraction natural language processing pipelines of unstructured and semi-structured medical texts
Nektarios Ladas ... Alina Rehberg
Health Informatics Journal | VOL. 29
Nektarios Ladas, et. al.Nektarios Ladas ... Alina Rehberg
01 Apr 2023
Health Informatics Journal | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Constructing efficient information extraction pipelines

Abstract

Talk to us

Similar Papers