Abstract

Location information in published studies represents an untapped resource for literature discovery, applicable to a range of domains. The ability to easily discover scientific articles from specific places, nearby locales, or similar (but geographically separate) areas worldwide is important for advancing science and addressing global sustainability challenges. However, the thematic and not geographic nature of current search tools makes location-based searches challenging and inefficient. Manually geolocating studies is labor intensive, and place-name recognition algorithms have performed poorly due to prevalence of irrelevant place names in scientific articles. These challenges have hindered past efforts to create map-based literature search tools. Thus, automated approaches are needed to sustain article georeferencing efforts. Common pattern-matching algorithms (parsers) can be used to identify and extract geographic coordinates from the text of published articles. Pattern-matching algorithms (geoparsers) were developed using regular expressions and lexical parsing and tested their performance against sets of full-text articles from multiple journals that were manually scanned for coordinates. Both geoparsers performed well at recognizing and extracting coordinates from articles with accuracy ranging from 85.1% to 100%, and the lexical geoparser performing marginally better. Omission errors (i.e. missed coordinates) were 0% to 14.9% for the regular expression geoparser and 0% to 10.3% for the lexical geoparser. Only a single commission error (i.e. erroneous coordinate) was encountered with the lexical geoparser. The ability to automatically identify and extract location information from published studies opens new possibilities for transforming scientific literature discovery and supporting novel research.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call