Abstract

One longstanding complication with Earth data discovery involves understanding a user’s search intent from the input query. Most of the geospatial data portals use keyword-based match to search data. Little attention has focused on the spatial and temporal information from a query or understanding the query with ontology. No research in the geospatial domain has investigated user queries in a systematic way. Here, we propose a query understanding framework and apply it to fill the gap by better interpreting a user’s search intent for Earth data search engines and adopting knowledge that was mined from metadata and user query logs. The proposed query understanding tool contains four components: spatial and temporal parsing; concept recognition; Named Entity Recognition (NER); and, semantic query expansion. Spatial and temporal parsing detects the spatial bounding box and temporal range from a query. Concept recognition isolates clauses from free text and provides the search engine phrases instead of a list of words. Name entity recognition detects entities from the query, which inform the search engine to query the entities detected. The semantic query expansion module expands the original query by adding synonyms and acronyms to phrases in the query that was discovered from Web usage data and metadata. The four modules interact to parse a user’s query from multiple perspectives, with the goal of understanding the consumer’s quest intent for data. As a proof-of-concept, the framework is applied to oceanographic data discovery. It is demonstrated that the proposed framework accurately captures a user’s intent.

Highlights

  • Discovering Earth science data is challenging, given the data’s increased volume, decreased latency, and heterogeneity across a wide variety of domains [1]

  • A set of synthetic queries are populated from PO.DAAC logs, in which user input query and corresponding filters that narrow the search are combined as a synthetic query, to evaluate the accuracy of the query understanding framework

  • Query understanding focuses on the beginning of the search process and it is significant for

Read more

Summary

Introduction

Discovering Earth science data is challenging, given the data’s increased volume, decreased latency, and heterogeneity across a wide variety of domains [1]. Some geospatial data portals have made great efforts to boost their search capabilities by introducing advanced technologies from computer science domain or customized configurations, e.g., PO.DAAC data portal introduced “Google-like” query syntax to support phrase query with input query “level 2” [2]; GeoNetwork [3], an open-source, distributed spatial information management system, indexes geospatial data, and supports data discovery upon Lucene [4]. Relevant keywords will not be smashed apart in the indexing and searching process, with advanced configuration of indexed fields and search fields These features optimize the search performance, it requires users to learn related syntax or understand the indexing workflow. A query understanding framework is proposed to better interpret users’ search intents in Earth data search engines by analyzing metadata and user query logs with advanced Natural language processing algorithms to make good use of valuable query-related knowledge hidden in metadata and Web usage data [5,6,7]. An oceanographic data discovery portal is utilized for evaluating the utility of the query understanding framework

Related Research
Query Understanding Framework Architecture
Spatial and Temporal Parsing
Concept Recognition
Named Entity Recognition
Query Expansion
System Design
Experiment Setup
Evaluation on on aa Sample
Quantitative Evaluation on a Set of Synthetic Queries
Conclusion and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call