Extracting Geoscientific Dataset Names from the Literature Based on the Hierarchical Temporal Memory Model

Kai Wu,Guoqing Li,Shaohua Wang,Zugang Chen,Haodong Wang,Jing Li,Hang Feng,Xinqian Wu

doi:10.3390/ijgi13070260

Abstract

Extracting geoscientific dataset names from the literature is crucial for building a literature–data association network, which can help readers access the data quickly through the Internet. However, the existing named-entity extraction methods have low accuracy in extracting geoscientific dataset names from unstructured text because geoscientific dataset names are a complex combination of multiple elements, such as geospatial coverage, temporal coverage, scale or resolution, theme content, and version. This paper proposes a new method based on the hierarchical temporal memory (HTM) model, a brain-inspired neural network with superior performance in high-level cognitive tasks, to accurately extract geoscientific dataset names from unstructured text. First, a word-encoding method based on the Unicode values of characters for the HTM model was proposed. Then, over 12,000 dataset names were collected from geoscience data-sharing websites and encoded into binary vectors to train the HTM model. We conceived a new classifier scheme for the HTM model that decodes the predictive vector for the encoder of the next word so that the similarity of the encoders of the predictive next word and the real next word can be computed. If the similarity is greater than a specified threshold, the real next word can be regarded as part of the name, and a successive word set forms the full geoscientific dataset name. We used the trained HTM model to extract geoscientific dataset names from 100 papers. Our method achieved an F1-score of 0.727, outperforming the GPT-4- and Claude-3-based few-shot learning (FSL) method, with F1-scores of 0.698 and 0.72, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Extracting Geoscientific Dataset Names from the Literature Based on the Hierarchical Temporal Memory Model

Abstract

Talk to us

Similar Papers

More From: ISPRS International Journal of Geo-Information

Lead the way for us

Journal: ISPRS International Journal of Geo-Information	Publication Date: Jul 21, 2024
License type: CC BY 4.0

Similar Papers

Each to their own beat: periodicity in temporal inference
Asma Motiwala ... Charles Fox
BMC Neuroscience | VOL. 12
Asma Motiwala, et. al.Asma Motiwala ... Charles Fox
18 Jul 2011
BMC Neuroscience | VOL. 12

A New Hierarchical Temporal Memory Algorithm Based on Activation Intensity.
Dejiao Niu ... Lei Li
Computational Intelligence and Neuroscience | VOL. 2022
Dejiao Niu, et. al.Dejiao Niu ... Lei Li
24 Jan 2022
Computational Intelligence and Neuroscience | VOL. 2022

Hierarchical Temporal Memory-Based One-Pass Learning for Real-Time Anomaly Detection and Simultaneous Data Prediction in Smart Grids
Anomadarshi Barua ... Mohammad Abdullah Al Faruque
IEEE Transactions on Dependable and Secure Computing | VOL. 19
Anomadarshi Barua, et. al.Anomadarshi Barua ... Mohammad Abdullah Al Faruque
11 Dec 2020
IEEE Transactions on Dependable and Secure Computing | VOL. 19

Waste-to-energy forecasting and real-time optimization: An anomaly-aware approach
Sin Yong Teng ... Petr Stehlík
Renewable Energy | VOL. 181
Sin Yong Teng, et. al.Sin Yong Teng ... Petr Stehlík
11 Sep 2021
Renewable Energy | VOL. 181

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extracting Geoscientific Dataset Names from the Literature Based on the Hierarchical Temporal Memory Model

Abstract

Talk to us

Similar Papers

More From: ISPRS International Journal of Geo-Information