Abstract

AbstractGeological reports are frequently used by geologists involved in geological surveys and scientific research to record the results and outcomes of geological surveys. With such a rich data source, a substantial amount of knowledge has yet to be mined and analyzed. This paper focuses on automatically information extraction from geological reports, namely, geological named entity recognition. Geological named entity recognition has an important role in data mining, knowledge discovery and Knowledge graph construction. Existing general named entity recognition models/tools are limited in the domain of geoscience due to the various language irregularities associated with geological text, such as informal sentence structures, several domain‐geoscience words, large character lengths and multiple combinations of independent words. We present Bidirectional encoder representations from transformers (BERT)‐(Bidirectional gated recurrent unit network) BiGRU‐ (Conditional random field) CRF, which is a deep learning‐based geological named entity recognition model that is designed specifically with these linguistic irregularities in mind. Based on the pretrained language model, an integrated deep learning model incorporating BERT, BiGRU and CRF is constructed to obtain character vectors rich in semantic information through the BERT pretrained language model to alleviate for the lack of specificity of static word vectors (e.g., word2vec) and to improve the extraction capability of complex geological entities. We demonstrate our proposed model by applying it to four test datasets, including a geoscience NER data set from regional geological reports, and by comparing its performance with those of five baseline models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call