GKEEP: An Enhanced Graph‐Based Keyword Extractor With Error‐Feedback Propagation for Geoscience Reports

Qinjun Qiu,Qinjun Qiu,Bin Wang,Zhong Xie,Zhong Xie,Bin Wang,Hong Xie

doi:10.1029/2020ea001602

Qinjun Qiu, Qinjun Qiu + Show 5 more

Open Access

PDF Available

https://doi.org/10.1029/2020ea001602

Copy DOI

Export

Save

Cite

Journal: Earth and Space Science	Publication Date: May 1, 2021
Citations: 11	License type: CC BY 4.0

Affiliation: China University of Geosciences

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

AbstractAs the amount of published geoscience literature grows, reading and summarizing texts of large collections has become a challenging task. Publication keywords can be considered basic components of knowledge structure representations and have been used to reveal knowledge concerning research domains. In contrast to data used in other research domains, the works on textual geoscience data that entail keyword extraction are limited. In this paper, we propose an unsupervised algorithm, the graph‐based keyword extractor with error‐feedback propagation (GKEEP), that enhances graph‐based keyword extraction approaches by using an error‐feedback mechanism similar to the concept of backpropagation. The proposed approach comprises the following steps. A preprocessed document is used as the input of the proposed model and is represented as a weighted undirected graph, where the vertices represent words and the edges represent the cooccurrence relationship between the words constrained by a window size. Subsequently, its nodes are ranked by their importance scores calculated by a graph‐based ranking algorithm. Consequently, all the words have their own scores, and they are used to compute the scores of keyword candidates. Subsequently, the Word2Vec method is applied to recalculate the scores of keyword candidates and rank the keyword candidates to select the final keyword. It also utilizes error feedback to boost the rankings of the most salient terms that would otherwise be deemed less important. With empirical experiments on two real data sets (including our newly built data set), the proposed GKEEP model outperforms state‐of‐the‐art unsupervised models and the existing graph‐based ranking models. The proposed method can effectively reflect intrinsic keyword semantics and interrelationships.

Full Text