Abstract
Composing queries is evidently a tedious task. This is particularly true of graph queries as they are typically complex and prone to errors, compounded by the fact that graph schemas can be missing or too loose to be helpful for query formulation. Despite the great success of query formulation aids, in particular, automatic query completion, graph query autocompletion has received much less research attention. In this paper, we propose a novel framework for subgraph query autocompletion (called AutoG). Given an initial query q and a user’s preference as input, AutoG returns ranked query suggestions $$Q'$$ as output. Users may choose a query from $$Q'$$ and iteratively apply AutoG to compose their queries. The novelties of AutoG are as follows: First, we formalize query composition. Second, we propose to increment a query with the logical units called c-prime features that are (i) frequent subgraphs and (ii) constructed from smaller c-prime features in no more than c ways. Third, we propose algorithms to rank candidate suggestions. Fourth, we propose a novel index called feature Dag (FDag) to optimize the ranking. We study the query suggestion quality with simulations and real users and conduct an extensive performance evaluation. The results show that the query suggestions are useful (saved roughly 40% of users’ mouse clicks), and AutoG returns suggestions shortly under a large variety of parameter settings.
Highlights
The prevalence of graph-structured data in modern real-world applications such as biological and chemical databases (e.g., PubChem), and co-purchase networks (e.g., Amazon.com) has lead to a rejuvenation of research on graph data management and analytics
To optimize ranked subgraph query suggestion problem (RSQ), we propose a novel index for c-prime features, called feature DAG (FDAG)
The time complexity of Algo. 1 is O(|features of the query (Fq)| ×Tsubiso + |E|2 × |MFq |), where (a) the first term is the time for determining the embeddings of Fq in q and Tsubiso is the time for a subgraph isomorphism call, and (b) the second term is for scanning the |MFq | embeddings to cover O(|E|) edges in the FIND function, which is invoked O(|E|) times
Summary
The prevalence of graph-structured data in modern real-world applications such as biological and chemical databases (e.g., PubChem), and co-purchase networks (e.g., Amazon.com) has lead to a rejuvenation of research on graph data management and analytics. Chemists are not often expected to learn the complex syntax of a graph query language in order to formulate meaningful queries over a chemical compound database such as PubChem or eMolecule.. There has been increasing efforts to create such user-friendly GUIs from academia [18] and industry (e.g., PubChem and eMolecule) to ease the burden of query formulation. Given a partiallyconstructed visual subgraph query, it is always desirable to suggest top-k possible query fragments that the user may potentially add to his/her intermediate query in the subsequent steps.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.