Extraction and generalisation of variables from scientific publications

Erwin Marsi,Pinar Öztürk

doi:10.18653/v1/d15-1057

Abstract

Scientific theories and models in Earth science typically involve changing variables and their complex interactions, including correlations, causal relations and chains of positive/negative feedback loops. Variables tend to be complex rather than atomic entities and expressed as noun phrases containing multiple modifiers, e.g. oxygen depletion in the upper 500 m of the ocean or timing and magnitude of surface temperature evolution in the Southern Hemisphere in deglacial proxy records. Text mining from Earth science literature is therefore significantly different from biomedical text mining and requires different approaches and methods. Our approach aims at automatically locating and extracting variables and their direction of variation: increasing, decreasing or just changing. Variables are initially extracted by matching tree patterns onto the syntax trees of the source texts. Next, variables are generalised in order to enhance their similarity, facilitating hierarchical search and inference. This generalisation is accomplished by progressive pruning of syntax trees using a set of tree transformation operations. Text mining results are presented as a browsable variable hierarchy which allows users to inspect all mentions of a particular variable type in the text as well as any generalisations or specialisations. The approach is demonstrated on a corpus of 10k abstracts of Nature publications in the field of Marine science. We discuss experiences with this early prototype and outline a number of possible improvements and directions for future re

Highlights

As a partial solution to this problem, we propose progressive pruning of syntax trees using a set of tree transformation operations
We have argued that the paradigm established in biomedical text mining does not transfer directly to other scientific domains like Earth science
A new approach was proposed for extracting variables and their direction of variation, focusing on events rather than entities

Summary

Introduction

Text mining of scientific literature originates from efforts to cope with the ever growing flood of publications in biomedicine (Swanson, 1986; Swanson, 1988; Swanson and Smalheiser, 1997; Hearst, 1999; Ananiadou et al, 2006; Zweigenbaum et al, 2007; Cohen and Hersh, 2005; Krallinger et al, 2008; Rodriguez-Esteban, 2009; Zweigenbaum and Demner-Fushman, 2009; Ananiadou et al, 2010; Simpson and Demner-Fushman, 2012; Ananiadou et al, 2014). We found that due to significant differences between the conceptual frameworks of biomedicine and marine science, “porting” the biomedical text mining infrastructure to another domain will not suffice. Defining the entities of interest in marine science turns out to be much harder Does it seem to be more open-ended in nature, the entities themselves tend to be complex and expressed as noun phrases containing multiple modifiers, giving rise to examples like oxygen depletion in the upper 500 m of the ocean or timing and magnitude of surface temperature evolution in the Southern Hemisphere in deglacial proxy records. Since many of these changing variables are long and complex expressions, their frequency of occurrence tends to be low, making the discovery of relations among different variables harder. Text mining results are presented as a browsable variable hierarchy which allows users to inspect all mentions of a particular variable type in the text as well as any generalisations or specialisations

Variable extraction

Variable generalisation

User interface

Findings

Discussion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Extraction and generalisation of variables from scientific publications

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2015
Citations: 31	License type: cc-by

Similar Papers

Getting started in text mining.
K Bretonnel Cohen ... Lawrence Hunter
PLoS Computational Biology | VOL. 4
K Bretonnel Cohen, et. al.K Bretonnel Cohen ... Lawrence Hunter
01 Jan 2008
PLoS Computational Biology | VOL. 4

Automated curation of gene name normalization results using the Konstanz information miner
Matthias Zwick
Journal of Biomedical Informatics | VOL. 53
Matthias ZwickMatthias Zwick
10 Sep 2014
Journal of Biomedical Informatics | VOL. 53

Biomedical text mining and its applications in cancer research
Fei Zhu ... Bairong Shen
Journal of Biomedical Informatics | VOL. 46
Fei Zhu, et. al.Fei Zhu ... Bairong Shen
15 Nov 2012
Journal of Biomedical Informatics | VOL. 46

PubRunner: A light-weight framework for updating text mining results
Kishore R Anekalla ... J.P Courneya
F1000Research | VOL. 6
Kishore R Anekalla, et. al.Kishore R Anekalla ... J.P Courneya
13 Oct 2017
F1000Research | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extraction and generalisation of variables from scientific publications

Abstract

Highlights

Summary

Talk to us

Similar Papers