Abstract

Superset query is widely used in object-oriented databases, data mining, and many other fields. Trie is an efficient index for superset query, whereas most existing trie index aim at improving query performance while ignoring storage overheads. To solve this problem, in this paper, we propose an efficient extended Level-Ordered Unary Degree Sequence (LOUDS) index: Ext-LOUDS. Ext-LOUDS expresses a trie by 1 integer vector and 3 bit vectors directly map each NodeID to its corresponding position, thus accelerating some key operations needed for superset query. Based on Ext-LOUDS, an efficient superset query algorithm, ELOUDS-Super, is designed. Experimental results on both real and synthetic datasets show that Ext-LOUDS can decrease 50%–60% space overheads compared with trie while maintaining a relative good query performance.

Highlights

  • With the rapid development in e-commerce, Internet of Things and many other fields, both the scale and complexity of data are increasing

  • We focus on superset query, that is, given a query set Q, retrieve all subsets of Q in a set dataset D (Q is the superset of these sets)

  • Perform a SELECT operation on IsFirstChild vector to obtain the starting position pstart and the ending position pend of the child nodes of node indicated by node_num; Perform a binary search to obtain the position p of the current query element Q[level] in Elems vector; If the node corresponding to p is an end node, perform a RANK operation on IsEnd vector to obtain the qualifying sets corresponding to the node, and merge it into the result set; If the node corresponding to p has child nodes, obtain the internalID and execute the algorithm recursively

Read more

Summary

Introduction

With the rapid development in e-commerce, Internet of Things and many other fields, both the scale and complexity of data are increasing. To effectively support the set query, trie often needs to be extended with some attributes (e.g., the prefix set of current node [10], the link to the node with the same label [9]) which usually are byte or integer types These pointers and extensions will inevitably increase the overheads of trie, thereby affecting its scalability, especially when extended to large datasets. Some recent works [18,19,20] researched efficient RANK & SELECT operations to improve retrieving performance of LOUDS He et al [21] designed a novel succinct structure that supports the mapping between preorder ranks and level-order ranks of nodes in constant time. Experimental results on two real datasets show that Ext-LOUDS can reduce space overheads by up to 50%–60% without significantly reducing query performance

Set Superset Query
Ext-LOUDS
ELOUDS-Super Algorithm
Algorithm Complexity Analysis
Experimental Environment and Datasets
Real Datasets
Synthetic Datasets
Experiments on Real Datasets
4.2.2.Experiments
Experiments on Synthetic Datasets experiment by fixing
Findings
Conclusions and Future
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call