A framework for information extraction from tables in biomedical literature

Nikola Milosevic,Goran Nenadic,Robert Hernandez,Cassie Gregson

doi:10.1007/s10032-019-00317-0

Nikola Milosevic, Goran Nenadic + Show 2 more

Open Access

PDF Available

https://doi.org/10.1007/s10032-019-00317-0

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The scientific literature is growing exponentially, and professionals are no more able to cope with the current amount of publications. Text mining provided in the past methods to retrieve and extract information from text; however, most of these approaches ignored tables and figures. The research done in mining table data still does not have an integrated approach for mining that would consider all complexities and challenges of a table. Our research is examining the methods for extracting numerical (number of patients, age, gender distribution) and textual (adverse reactions) information from tables in the clinical literature. We present a requirement analysis template and an integral methodology for information extraction from tables in clinical domain that contains 7 steps: (1) table detection, (2) functional processing, (3) structural processing, (4) semantic tagging, (5) pragmatic processing, (6) cell selection and (7) syntactic processing and extraction. Our approach performed with the F-measure ranged between 82 and 92%, depending on the variable, task and its complexity.

Highlights

The literature in the biomedical domain is growing exponentially
Fields of text mining and natural language processing provide tools and methodologies that can help with retrieving relevant information
In order to examine why our method extracted the number of patients only from 26% of documents, we examined a sample of 25 documents from which the number of patients was not extracted and found following reasons:

Summary

Introduction

The literature in the biomedical domain is growing exponentially. Currently, there are over 26 million articles indexed in MEDLINE [35]. Most of the current approaches are limited to the textual body of articles, usually ignoring figures, tables and other semi-structured presentation formats of information. Textual content is usually dense, containing ambiguous short chunks of text with the use of acronyms and abbreviations. This is especially true in biomedical publications. In addition to natural language processing challenges, make it hard to understand the structure and the information that the table introduces. Information extraction from tables requires multilayered analysis that will include functional, structural, pragmatic, syntactic and semantic analysis. Our research is focusing on the task of extracting numerical and textual information from tables. We present a framework for information extraction from tables in biomedical documents. We compare a machine learning approach to a rule-based approach to identify cells with information of interest and evaluate how and where machine learning can help efficient information extraction from tables

Background

Extraction template

Numeric information groups

Textual variables

Information extraction methodology

Table detection

Functional processing

Structural processing

Semantic tagging

Pragmatic processing

Cell selection

Pattern analysis and value extraction

Cell selection using lexical and semantic rules

Syntactic rules and syntactic processing

Functional and structural table analysis

Dataset

Rule-based information extraction

Machine learning-based information extraction

Generalizability case study

Document reading and table detection

Functional and structural processing

Pragmatic analysis

Table annotation

Cell selection and syntactic analysis

Remarks about the generalizability of framework

Conclusion

Findings

Future work

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal on Document Analysis and Recognition (IJDAR)	Publication Date: Feb 15, 2019
Citations: 43	License type: open-access

R Discovery Prime

A framework for information extraction from tables in biomedical literature

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: International Journal on Document Analysis and Recognition (IJDAR)

Lead the way for us

Similar Papers

Information extraction framework for Kurunthogai
C N Subalalitha
Sādhanā | VOL. 44
C N SubalalithaC N Subalalitha
05 Jun 2019
Sādhanā | VOL. 44

Ontology-based Sequence Labelling for Automated Information Extraction for Supporting Bridge Data Analytics
Kaijian Liu ... Nora El-Gohary
Procedia Engineering | VOL. 145
Kaijian Liu, et. al.Kaijian Liu ... Nora El-Gohary
01 Jan 2015
Procedia Engineering | VOL. 145

A Web Information Extraction Framework with Adaptive and Failure Prediction Feature
Sudhir Kumar Patnaik ... C Narendra Babu
Journal of Data and Information Quality | VOL. 14
Sudhir Kumar Patnaik, et. al.Sudhir Kumar Patnaik ... C Narendra Babu
23 Mar 2022
Journal of Data and Information Quality | VOL. 14

QA4IE: A Question Answering Based Framework for Information Extraction
Lin Qiu ... Kewei Tu
-
Lin Qiu, et. al.Lin Qiu ... Kewei Tu
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

A framework for information extraction from tables in biomedical literature

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: International Journal on Document Analysis and Recognition (IJDAR)