Information Extraction System Research Articles

Background The HBCP knowledge system identifies and extracts entities from randomised controlled trials of behaviour change interventions organised by a behaviour change intervention ontology (BCIO) to populate: 1) an outcome prediction tool; and 2) a research browser tool. This knowledge system requires automated information extraction algorithms to query and interpret evidence from behaviour change intervention (BCI) reports. This paper reports the results of an evaluation of the automated information extraction and reflects on the results in relation to the challenges of interdisciplinary working and collaboration. Methods The evaluation used a dataset of 117 previously unseen BCI reports to assess its performance. The automatically extracted information was compared to the full text PDF by trained annotators on essential BCIO entities required for the outcome prediction tool and research browser tool and whether the extracted information was assigned to the correct arm of the randomised trial. Essential entities were the outcome value, a selection of the most common Behaviour Change Techniques (BCTs), the mode of intervention delivery, and key population characteristics. Results The evaluation found an outcome value present in 53.85% (n=63) of the output from the information extraction system but it never extracted both the correct outcome values (interventions and control arms) and assigned them to the correct study arms (intervention and control). Although 84.62% (n=99) of the papers contained information relevant to Behaviour Change Techniques (BCTs), the information extraction algorithm correctly extracted only 58.59% (n=58) of BCTs. Conclusions The evaluation found that the information extraction algorithm did not extract the outcome values and key BCIO entities correctly against the correct arms in any of the papers in our sample, making it unsuitable for deployment in the outcome prediction and research browser tools. Several challenges with working in interdisciplinary teams were identified and discussed along with lessons learned for future work.

Read full abstract

BackgroundRadiology reports are typically written in a free-text format, making clinical information difficult to extract and use. Recently, the adoption of structured reporting (SR) has been recommended by various medical societies thanks to the advantages it offers, e.g. standardization, completeness, and information retrieval. We propose a pipeline to extract information from Italian free-text radiology reports that fits with the items of the reference SR registry proposed by a national society of interventional and medical radiology, focusing on CT staging of patients with lymphoma. MethodsOur work aims to leverage the potential of Natural Language Processing and Transformer-based models to deal with automatic SR registry filling. With the availability of 174 Italian radiology reports, we investigate a rule-free generative Question Answering approach based on the Italian-specific version of T5: IT5. To address information content discrepancies, we focus on the six most frequently filled items in the annotations made on the reports: three categorical (multichoice), one free-text (free-text), and two continuous numerical (factual). In the preprocessing phase, we encode also information that is not supposed to be entered. Two strategies (batch-truncation and ex-post combination) are implemented to comply with the IT5 context length limitations. Performance is evaluated in terms of strict accuracy, f1, and format accuracy, and compared with the widely used GPT-3.5 Large Language Model. Unlike multichoice and factual, free-text answers do not have 1-to-1 correspondence with their reference annotations. For this reason, we collect human-expert feedback on the similarity between medical annotations and generated free-text answers, using a 5-point Likert scale questionnaire (evaluating the criteria of correctness and completeness). ResultsThe combination of fine-tuning and batch splitting allows IT5 ex-post combination to achieve notable results in terms of information extraction of different types of structured data, performing on par with GPT-3.5. Human-based assessment scores of free-text answers show a high correlation with the AI performance metrics f1 (Spearman's correlation coefficients>0.5, p-values<0.001) for both IT5 ex-post combination and GPT-3.5. The latter is better at generating plausible human-like statements, even if it systematically provides answers even when they are not supposed to be given. ConclusionsIn our experimental setting, a fine-tuned Transformer-based model with a modest number of parameters (i.e., IT5, 220 M) performs well as a clinical information extraction system for automatic SR registry filling task. It can extract information from more than one place in the report, elaborating it in a manner that complies with the response specifications provided by the SR registry (for multichoice and factual items), or that closely approximates the work of a human-expert (free-text items); with the ability to discern when an answer is supposed to be given or not to a user query.

Read full abstract

Information Extraction System Research Articles

Related Topics

Articles published on Information Extraction System

Business insights using RAG–LLMs: a review and case study

Automated information extraction for behavioural interventions: evaluation and reflections on interdisciplinary AI development

Reshaping free-text radiology notes into structured reports with generative question answering transformers

Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition.

Enhancing Knowledge Graph Consistency through Open Large Language Models: A Case Study

An automated information extraction system from the knowledge graph based annual financial reports

Use of Natural Language Processing to Infer Sites of Metastatic Disease From Radiology Reports at Scale.

A Novel Network Protocol Syntax Extracting Method for Grammar-Based Fuzzing

Multimodal weighted graph representation for information extraction from visually rich documents

HCL: A Hierarchical Contrastive Learning Framework for Zero-Shot Relation Extraction.

Zero-shot information extraction from radiological reports using ChatGPT

A Cost-Effective System for Indoor Three-Dimensional Occupant Positioning and Trajectory Reconstruction

AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models.

A Scoping Review of Adopted Information Extraction Methods for RCTs.

DEBBIE: The Open Access Database of Experimental Scaffolds and Biomaterials Built Using an Automated Text Mining Pipeline.

Assessment of the E3C corpus for the recognition of disorders in clinical texts

RINX: A system for information and knowledge extraction from resumes

FVI-BD: Multiple File Extraction using Fusion Vector Investigation (FVI) in Big Data Hadoop Environment

Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers-Assisted Sublanguage Analysis.

A natural language processing system for the efficient updating of highly curated pathophysiology mechanism knowledge graphs

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Information Extraction System Research Articles

Related Topics

Articles published on Information Extraction System

Business insights using RAG–LLMs: a review and case study

Automated information extraction for behavioural interventions: evaluation and reflections on interdisciplinary AI development

Reshaping free-text radiology notes into structured reports with generative question answering transformers

Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition.

Enhancing Knowledge Graph Consistency through Open Large Language Models: A Case Study

An automated information extraction system from the knowledge graph based annual financial reports

Use of Natural Language Processing to Infer Sites of Metastatic Disease From Radiology Reports at Scale.

A Novel Network Protocol Syntax Extracting Method for Grammar-Based Fuzzing

Multimodal weighted graph representation for information extraction from visually rich documents

HCL: A Hierarchical Contrastive Learning Framework for Zero-Shot Relation Extraction.

Zero-shot information extraction from radiological reports using ChatGPT

A Cost-Effective System for Indoor Three-Dimensional Occupant Positioning and Trajectory Reconstruction

AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models.

A Scoping Review of Adopted Information Extraction Methods for RCTs.

DEBBIE: The Open Access Database of Experimental Scaffolds and Biomaterials Built Using an Automated Text Mining Pipeline.

Assessment of the E3C corpus for the recognition of disorders in clinical texts

RINX: A system for information and knowledge extraction from resumes

FVI-BD: Multiple File Extraction using Fusion Vector Investigation (FVI) in Big Data Hadoop Environment

Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers-Assisted Sublanguage Analysis.

A natural language processing system for the efficient updating of highly curated pathophysiology mechanism knowledge graphs