Top-k Query Processing Algorithms Research Articles

Purpose This paper aims to propose a new keyword search method on graph data to improve the relevance of search results and reduce duplication of content nodes in the answer trees obtained by previous approaches based on distinct root semantics. The previous approaches are restricted to find answer trees having different root nodes and thus often generate a result consisting of answer trees with low relevance to the query or duplicate content nodes. The method allows limited redundancy in the root nodes of top-k answer trees to produce more effective query results. Design/methodology/approach A measure for redundancy in a set of answer trees regarding their root nodes is defined, and according to the metric, a set of answer trees with limited root redundancy is proposed for the result of a keyword query on graph data. For efficient query processing, an index on the useful paths in the graph using inverted lists and a hash map is suggested. Then, based on the path index, a top-k query processing algorithm is presented to find most relevant and diverse answer trees given a maximum amount of root redundancy allowed for a set of answer trees. Findings The results of experiments using real graph datasets show that the proposed approach can produce effective query answers which are more diverse in the content nodes and more relevant to the query than the previous approach based on distinct root semantics. Originality/value This paper first takes redundancy in the root nodes of answer trees into account to improve the relevance and content nodes redundancy of query results over the previous distinct root semantics. It can satisfy the users’ various information need on a large and complex graph data using a keyword-based query.

Read full abstract

WITH the prevalence of Web search engines, keyword search has become the most popular way for users to retrieve information from text documents. On the other hand, there is an enormous amount of valuable information stored in structured form (relational or semistructured) in Internet, intranet, and enterprise databases. To query such data sources, users traditionally depended on specialized applications because for most users it is difficult to use structured or semistructured query languages. In recent years, enterprise search has gained popularity where a keyword-based search model is used for intranet data sources. However, in most of these systems, the structured data objects that can be retrieved via keyword search have to be predefined. The database research community has been focusing on developing some of the key technology that holds the promise of generalizing the reach of keyword search over structured and semistructured data beyond the state of the practice in commercial enterprise search engines. Some of the problems that have received attention include the task of automatically assembling a data object on the fly in response to a keyword search query over structured or semistructured data, designing an appropriate ranking function, and supporting top-k retrieval efficiently for the ranking functions. This special section of the IEEE Transactions on Knowledge and Data Engineering (TKDE) features a collection of four papers, selected from 16 submissions, representing recent advances in keyword search on structured data. These works present novel techniques for searching relational databases, text-rich databases, as well as XML data. The first paper, “SPARK2: Top-k Keyword Query in Relational Databases” by Yi Luo, Wei Wang, Xuemin Lin, Xiaofang Zhou, Jianmin Wang, ang Keqiu Li addresses the effectiveness and efficiency challenges of keyword search on relational databases. The authors propose a new ranking method that adapts the state-of-the-art IR ranking principles for keyword search over structured data. However, in generating top-k ranked results efficiently, the nonmonotonic nature of this ranking function renders known top-k query processing techniques inapplicable. To address the challenge, the authors propose a set of efficient top-k query processing algorithms for this ranking method that minimize database probing by leveraging novel score upper bounding functions. In the second paper, “Finding Top-k Answers in Keyword Search over Relational Databases Using Tuple Units,” Jianhua Feng, Guoliang Li, and Jianyong Wang use indexes to record joined tuples (named as tuple units) in the databases. In contrast to existing work where a query result is a single tuple unit, this paper allows multiple related tuple units to be leveraged to answer a keyword query to improve search quality. To enhance the performance, the authors propose two indexes that capture relationships between different tuple units, and then develop new ranking techniques and algorithms to progressively find the top-k query results. The third paper is “Efficient Keyword-Based Search for Top-K Cells in Text Cube” by Bolin Ding, Bo Zhao, Cindy Xide Lin, Jiawei Han, Chengxiang Zhai, Ashok Srivastava, and Nikunj C. Oza. It focuses on the scenario where the repository contains both structured and text data. Specifically, it studies the problem of keyword search in text cube, built on a multidimensional text database where each row is associated with a document and several structured dimensions. Unlike existing work where an individual document or a (joined) tuple is a query result, this work considers a cell as a query result. Given a keyword query, the goal of this paper is to find the top-k most relevant cells. The authors develop an IR-style relevance model for ranking cells, and then propose efficient algorithms to address the computational challenge due to the large number of cells in a text cube. The final paper in this special section, “Returning Clustered Results for Keyword Search on XML Documents” by Xiping Liu, Changxuan Wan, and Lei Chen, presents a new semantics for answering keyword queries on XML data and techniques to generate clustered search results. The authors propose an efficient algorithm that clusters results on-the-fly by first generating cluster labels and then clustered results. Furthermore, they propose a technique that constructs a cluster hierarchy that is interpretable and provides a general-to-specific view of the results. We would like to thank all of the authors who submitted papers to this special section for their high-quality contributions. We also thank the referees for their generous help and valuable suggestions. We are grateful to Professor Beng-Chin Ooi, the Editor-in-Chief of TKDE, for his strong support for this special section. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 12, DECEMBER 2011 1761

Read full abstract

Top-k Query Processing Algorithms Research Articles

Related Topics

Articles published on Top-k Query Processing Algorithms

Comparative Studies on Intelligent Swarming Network (iSWAN) Geno-Generative Algorithm and Top-K Query Processing Algorithm

Many are Better than One: Algorithm Selection for Faster Top-K Retrieval

Privacy-Preserving Top-k Query Processing Algorithms Using Efficient Secure Protocols over Encrypted Database in Cloud Computing Environment

A new Top-k query processing algorithm to guarantee confidentiality of data and user queries on outsourced databases

Effective keyword search on graph data using limited root redundancy of answer trees

Waves: a fast multi-tier top-k query processing algorithm

그리드 인덱스 기반 뷰 선택 기법을 이용한 효율적인 Top-k 질의처리 알고리즘

Efficient Top-k Query Processing Algorithms in Highly Distributed Environments

IKernel: Exact indexing for support vector machines

Crowdsourced Trace Similarity with Smartphones

Guest Editors Introduction: Special Section on Keyword Search on Structured Data

Exact Top-K Queries in Wireless Sensor Networks

Power efficiency through tuple ranking in wireless sensor network monitoring

Efficient Top-k Query Processing in Pure Peer-to-Peer Network

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Top-k Query Processing Algorithms Research Articles

Related Topics

Articles published on Top-k Query Processing Algorithms

Comparative Studies on Intelligent Swarming Network (iSWAN) Geno-Generative Algorithm and Top-K Query Processing Algorithm

Many are Better than One: Algorithm Selection for Faster Top-K Retrieval

Privacy-Preserving Top-k Query Processing Algorithms Using Efficient Secure Protocols over Encrypted Database in Cloud Computing Environment

A new Top-k query processing algorithm to guarantee confidentiality of data and user queries on outsourced databases

Effective keyword search on graph data using limited root redundancy of answer trees

Waves: a fast multi-tier top-k query processing algorithm

그리드 인덱스 기반 뷰 선택 기법을 이용한 효율적인 Top-k 질의처리 알고리즘

Efficient Top-k Query Processing Algorithms in Highly Distributed Environments

IKernel: Exact indexing for support vector machines

Crowdsourced Trace Similarity with Smartphones

Guest Editors Introduction: Special Section on Keyword Search on Structured Data

Exact Top-K Queries in Wireless Sensor Networks

Power efficiency through tuple ranking in wireless sensor network monitoring

Efficient Top-k Query Processing in Pure Peer-to-Peer Network