Information Extraction Tasks Research Articles

In the field of structured information extraction, there are typically semantic and syntactic constraints on the output of information extraction (IE) systems. These constraints, however, can typically not be guaranteed using standard (fine-tuned) encoder-decoder architectures. This has led to the development of constrained decoding approaches which allow, e.g., to specify constraints in form of context-free grammars. An open question is in how far an IE system can be effectively guided by a domain-specific grammar to ensure that the output structures follow the requirements of a certain domain data model. In this work we experimentally investigate the influence of grammar-constrained decoding as well as pointer generators on the performance of a domain-specific information extraction system. For this, we consider fine-tuned encoder-decoder models, Longformer and Flan-T5 in particular, and experimentally investigate whether the addition of grammar-constrained decoding and pointer generators improve information extraction results. Toward this goal, we consider the task of inducing structured representations from abstracts describing clinical trials, relying on the C-TrO ontology to semantically describe the clinical trials and their results. We frame the task as a slot filling problem where certain slots of templates need to be filled with token sequences occurring in the input text. We use a dataset comprising 211 annotated clinical trial abstracts about type 2 diabetes and glaucoma for training and evaluation. Our focus is on settings in which the available training data is in the order of a few hundred training examples, which we consider as a low-resource setting. In all our experiments we could demonstrate the positive impact of grammar-constrained decoding, with an increase in F 1 score of pp 0.351 (absolute score 0.413) and pp 0.425 (absolute score 0.47) for the best-performing models on type 2 diabetes and glaucoma datasets, respectively. The addition of the pointer generators had a detrimental impact on the results, decreasing F 1 scores by pp 0.15 (absolute score 0.263) and pp 0.198 (absolute score 0.272) for the best-performing pointer generator models on type 2 diabetes and glaucoma datasets, respectively. The experimental results indicate that encoder-decoder models used for structure prediction for information extraction tasks in low-resource settings clearly benefit from grammar-constrained decoding guiding the output generation. In contrast, the evaluated pointer generator models decreased the performance drastically in some cases. Moreover, the performance of the pointer models appears to depend both on the used base model as well as the function used for aggregating the attention values. How the size of large language models affects the performance benefit of grammar-constrained decoding remains to be more structurally investigated in future work.

Read full abstract

Relational and logical methods of knowledge representation play a key role in creating a mathematical basis for information systems. Predicate algebra and predicate operators are among the most effective tools for describing information in detail. These tools make it easy to formulate formalized information, create database queries, and simulate human activity. In the context of the new need for reliable and efficient data selection, a problem arises in deeper analysis. Subject of the study is the theory of quantum linear equations based on the algebra of linear predicate operations, the formal apparatus of linear logic operators and methods for solving logical equations in information extraction tasks. Aim of the study is a developing of a method for using linear logic operators and logical equations to extract information. This approach can significantly optimize the process of extracting the necessary information, even in huge databases. Main tasks: analysis of existing approaches to information extraction; consideration of the theory of linear logic operators; study of methods for reducing logic to an algebraic form; analysis of logical spaces and the algebra of finite predicate actions and the theory of linear logic operators. The research methods involve a systematic analysis of the mathematical structure of the algebra of finite predicates and predicate functions to identify the key elements that affect the query formation process. The method of using linear logic operators and logical equations for information extraction is proposed. The results of the study showed that the method of using linear logic operators and logical equations is a universal and adaptive tool for working with algebraic data structures. It can be applied in a wide range of information extraction tasks and prove its value as one of the possible methods of information processing. Conclusion. The paper investigates formal methods of intelligent systems, in particular, ways of representing knowledge in accordance with the peculiarities of the field of application and the language that allows encoding this knowledge for storage in computer memory. The proposed method can be implemented in the development of language interfaces for automated information access systems, in search engine algorithms, for logical analysis of information in databases and expert systems, as well as in performing tasks related to object recognition and classification.

Read full abstract

Information Extraction Tasks Research Articles

Related Topics

Articles published on Information Extraction Tasks

RAMIE: retrieval-augmented multi-task information extraction with large language models on dietary supplements.

Grammar-constrained decoding for structured information extraction with fine-tuned generative models applied to clinical trial abstracts.

Electronic Health Records (EHR) Extraction Using Various NLP Models

Named entity recognition in government domain: A systematic literature review

CNN based multi-label image classification for presentation recommender system

Business insights using RAG–LLMs: a review and case study

Political-RAG: using generative AI to extract political information from media content

Developing Semantic Textual Similarity for Guragigna Language Using Deep Learning Approach

Joint Extraction of Biomedical Events Based on Dynamic Path Planning Strategy and Hybrid Neural Network.

Causal-relationship representation enhanced joint extraction model for elements and relationships

CJE-PCHF: Chinese Joint Entity and Relation Extraction Model Based on Progressive Contrastive Learning and Heterogeneous Feature Fusion

Better together: Automated app review analysis with deep multi-task learning

How Does a Generative Large Language Model Perform on Domain-Specific Information Extraction?─A Comparison between GPT-4 and a Rule-Based Method on Band Gap Extraction.

A Class-Incremental Learning Method for Interactive Event Detection via Interaction, Contrast and Distillation

Development of message passing-based graph convolutional networks for classifying cancer pathology reports

CMCN: Chinese medical concept normalization using continual learning and knowledge-enhanced

GAT-Based Bi-CARU with Adaptive Feature-Based Transformation for Video Summarisation

COfEE: A comprehensive ontology for event extraction from text

The method of linear-logical operators and logical equations in information extraction tasks

OpenChemIE: An Information Extraction Toolkit for Chemistry Literature.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Information Extraction Tasks Research Articles

Related Topics

Articles published on Information Extraction Tasks

RAMIE: retrieval-augmented multi-task information extraction with large language models on dietary supplements.

Grammar-constrained decoding for structured information extraction with fine-tuned generative models applied to clinical trial abstracts.

Electronic Health Records (EHR) Extraction Using Various NLP Models

Named entity recognition in government domain: A systematic literature review

CNN based multi-label image classification for presentation recommender system

Business insights using RAG–LLMs: a review and case study

Political-RAG: using generative AI to extract political information from media content

Developing Semantic Textual Similarity for Guragigna Language Using Deep Learning Approach

Joint Extraction of Biomedical Events Based on Dynamic Path Planning Strategy and Hybrid Neural Network.

Causal-relationship representation enhanced joint extraction model for elements and relationships

CJE-PCHF: Chinese Joint Entity and Relation Extraction Model Based on Progressive Contrastive Learning and Heterogeneous Feature Fusion

Better together: Automated app review analysis with deep multi-task learning

How Does a Generative Large Language Model Perform on Domain-Specific Information Extraction?─A Comparison between GPT-4 and a Rule-Based Method on Band Gap Extraction.

A Class-Incremental Learning Method for Interactive Event Detection via Interaction, Contrast and Distillation

Development of message passing-based graph convolutional networks for classifying cancer pathology reports

CMCN: Chinese medical concept normalization using continual learning and knowledge-enhanced

GAT-Based Bi-CARU with Adaptive Feature-Based Transformation for Video Summarisation

COfEE: A comprehensive ontology for event extraction from text

The method of linear-logical operators and logical equations in information extraction tasks

OpenChemIE: An Information Extraction Toolkit for Chemistry Literature.