Abstract
Superset query is widely used in object-oriented databases, data mining, and many other fields. Trie is an efficient index for superset query, whereas most existing trie index aim at improving query performance while ignoring storage overheads. To solve this problem, in this paper, we propose an efficient extended Level-Ordered Unary Degree Sequence (LOUDS) index: Ext-LOUDS. Ext-LOUDS expresses a trie by 1 integer vector and 3 bit vectors directly map each NodeID to its corresponding position, thus accelerating some key operations needed for superset query. Based on Ext-LOUDS, an efficient superset query algorithm, ELOUDS-Super, is designed. Experimental results on both real and synthetic datasets show that Ext-LOUDS can decrease 50%–60% space overheads compared with trie while maintaining a relative good query performance.
Highlights
With the rapid development in e-commerce, Internet of Things and many other fields, both the scale and complexity of data are increasing
We focus on superset query, that is, given a query set Q, retrieve all subsets of Q in a set dataset D (Q is the superset of these sets)
Perform a SELECT operation on IsFirstChild vector to obtain the starting position pstart and the ending position pend of the child nodes of node indicated by node_num; Perform a binary search to obtain the position p of the current query element Q[level] in Elems vector; If the node corresponding to p is an end node, perform a RANK operation on IsEnd vector to obtain the qualifying sets corresponding to the node, and merge it into the result set; If the node corresponding to p has child nodes, obtain the internalID and execute the algorithm recursively
Summary
With the rapid development in e-commerce, Internet of Things and many other fields, both the scale and complexity of data are increasing. To effectively support the set query, trie often needs to be extended with some attributes (e.g., the prefix set of current node [10], the link to the node with the same label [9]) which usually are byte or integer types These pointers and extensions will inevitably increase the overheads of trie, thereby affecting its scalability, especially when extended to large datasets. Some recent works [18,19,20] researched efficient RANK & SELECT operations to improve retrieving performance of LOUDS He et al [21] designed a novel succinct structure that supports the mapping between preorder ranks and level-order ranks of nodes in constant time. Experimental results on two real datasets show that Ext-LOUDS can reduce space overheads by up to 50%–60% without significantly reducing query performance
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.