Abstract

One of the challenges of improving the search and use of complex Earth science data is designing and incorporating semantic components in existing Earth science data systems. Many projects have addressed this by using a knowledge engineering approach. However, using ontologies has inherent limitations as a practical and scalable approach. Data-driven strategies based on natural language processing, coupled with Machine Learning, provide an alternative approach. Data-driven approaches utilize existing corpus available as unstructured text. This paper describes a hybrid strategy that uses a data-driven approach to build an embedding from a large corpus of Earth science journal publications while leveraging existing ontologies to develop validation tests to evaluate the embedding's robustness and correctness. The paper also describes the use of this embedding in two different applications. The first application provides a semantic mapping service to bridge the gap between a science application need and the appropriate instruments or datasets required to address that need. The second application is keyword recommender to make the data set tagging process efficient for the data operators and ensure keyword consistency within a data catalog.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.