Extracting and modeling geographic information from scientific articles.

Elise Acheson,Ross S Purves

doi:10.1371/journal.pone.0244918

Abstract

Scientific articles often contain relevant geographic information such as where field work was performed or where patients were treated. Most often, this information appears in the full-text article contents as a description in natural language including place names, with no accompanying machine-readable geographic metadata. Automatically extracting this geographic information could help conduct meta-analyses, find geographical research gaps, and retrieve articles using spatial search criteria. Research on this problem is still in its infancy, with many works manually processing corpora for locations and few cross-domain studies. In this paper, we develop a fully automatic pipeline to extract and represent relevant locations from scientific articles, applying it to two varied corpora. We obtain good performance, with full pipeline precision of 0.84 for an environmental corpus, and 0.78 for a biomedical corpus. Our results can be visualized as simple global maps, allowing human annotators to both explore corpus patterns in space and triage results for downstream analysis. Future work should not only focus on improving individual pipeline components, but also be informed by user needs derived from the potential spatial analysis and exploration of such corpora.

Highlights

Geographical information permeates the written world, appearing as place names or place descriptions in texts including news articles, blog posts, social media content, historical documents, and scientific articles
Research on extracting geographical information from text has often focused on news articles [1,2,3] and social media content [4,5,6], with surprisingly limited attention being directed towards the increasing number of published scientific articles
We develop a fully automatic pipeline which starts from a collection of scientific articles and their pipeline recall and F1. (PDF) and outputs a set of location strings and their sentence context, as well as structured information and a geometric representation for each string (Fig 1)

Summary

Introduction

Geographical information permeates the written world, appearing as place names or place descriptions in texts including news articles, blog posts, social media content, historical documents, and scientific articles. With each passing year, scientists face an ever-growing stack of scientific articles to sort through, read, understand, and build upon Many of these articles contain important geographical information: perhaps soil samples were taken from a certain region, patients were treated in a particular hospital, or interviews were conducted in a village or neighborhood. Researchers must manually sift through article contents to identify any relevant locations, a time-consuming process. Linking these textual place descriptions to spatial representations (such as point coordinates, a bounding box, or a polygonal region) requires significant additional work and should ideally respect the scale and precision of locations described in the text. Good performance is obtained after adding custom pre- and post-processing steps, such as enhancing word lists with geology-specific terms and detecting citations in order to skip them as location candidates

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Jan 6, 2021
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Extracting and modeling geographic information from scientific articles.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Real-time Visual Object Tracking with Natural Language Description
Qi Feng ... Qinxun Bai
-
Qi Feng, et. al.Qi Feng ... Qinxun Bai
01 Mar 2020
01 Mar 2020

Knowledge Extraction from Task Narratives
Kristina Y Yordanova ... José Hernández-Orallo
-
Kristina Y Yordanova, et. al.Kristina Y Yordanova ... José Hernández-Orallo
21 Sep 2017
21 Sep 2017

Siamese Natural Language Tracker: Tracking by Natural Language Descriptions with Siamese Trackers
Qi Feng ... Stan Sclaroff
-
Qi Feng, et. al.Qi Feng ... Stan Sclaroff
01 Jun 2021
01 Jun 2021

Framer: Planning Models from Natural Language Action Descriptions
Alan Lindsay ... Thomas Hayton
Proceedings of the International Conference on Automated Planning and Scheduling | VOL. 27
Alan Lindsay, et. al.Alan Lindsay ... Thomas Hayton
05 Jun 2017
Proceedings of the International Conference on Automated Planning and Scheduling | VOL. 27

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extracting and modeling geographic information from scientific articles.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE