Abstract

Owing to the development of natural language processing and deep learning models, geological text data have become a vital resource for knowledge discovery and have attracted the attention of publishers, academic organizations, and domain scientists. However, the extraction of information from unstructured literature still remains a challenge, in which a fundamental issue is the categories and the type of discipline-specific information. This paper presents an effective workflow of building and applying ontologies in geoscience text mining, which includes a use case-driven method for building an ontology model of porphyry copper deposits, an entity annotation schema for text mining, and implementation of them to tackle real-world data. First, the Dexing porphyry copper deposit was selected as a case study to guide the construction of the ontology model. Text data in this study provided a series of entity instances. By analyzing both domain knowledge of mineral deposit models and the instance data, we built classes in the ontology. Second, with the established ontology, a named entity annotation schema comprising 21 entity tokens was designed to scale up the text mining tasks. Third, based on the annotation schema, a draft corpus with more than 200,000 words and a finely corrected corpus of 53,339 words were built for training a geological entity recognizer for porphyry copper deposits. The performance of the geological entity recognizer and the statistical distribution of entities in the corpus prove that the workflow presented in this study is effective for designing entity annotation schemas and facilitating large-scale text data mining in geoscience.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.