Texts have become an important spatial data resource. Interpretation of unstructured geoscience texts using natural language processing methods can effectively facilitate the discovery and retrieval of geographic information. Yet studies on the extraction of spatial information from textual geoscience data are limited compared to digital geoscience data. In this work, a machine learning approach is proposed for mining spatial relations in Chinese geological texts. The approach views spatial relation extraction as a sequence labeling problem, avoids the division of relation categories, and enables mining fine-grained spatial relations. The extracted geological texts commonly describe three-dimensional spatial relations among regions, strata, and lithologies. The extracted spatial relations are classified into three major categories (topological relations, absolute directional relations and relative directional relations) and 14 subcategories. We validated the proposed model with a test dataset, constructed visual displays of the extracted spatial relations on different topics, and quantified the uncertainty in the process from spatial entity recognition to spatial relation extraction. With the detailed portrayal of these spatial relations, this study provides support for solving theoretical and practical problems of cognition, prediction, decision-making, and evaluation in geoscience.
Read full abstract