Abstract
Purpose This paper aims to propose a new keyword search method on graph data to improve the relevance of search results and reduce duplication of content nodes in the answer trees obtained by previous approaches based on distinct root semantics. The previous approaches are restricted to find answer trees having different root nodes and thus often generate a result consisting of answer trees with low relevance to the query or duplicate content nodes. The method allows limited redundancy in the root nodes of top-k answer trees to produce more effective query results. Design/methodology/approach A measure for redundancy in a set of answer trees regarding their root nodes is defined, and according to the metric, a set of answer trees with limited root redundancy is proposed for the result of a keyword query on graph data. For efficient query processing, an index on the useful paths in the graph using inverted lists and a hash map is suggested. Then, based on the path index, a top-k query processing algorithm is presented to find most relevant and diverse answer trees given a maximum amount of root redundancy allowed for a set of answer trees. Findings The results of experiments using real graph datasets show that the proposed approach can produce effective query answers which are more diverse in the content nodes and more relevant to the query than the previous approach based on distinct root semantics. Originality/value This paper first takes redundancy in the root nodes of answer trees into account to improve the relevance and content nodes redundancy of query results over the previous distinct root semantics. It can satisfy the users’ various information need on a large and complex graph data using a keyword-based query.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.