Abstract

Box queries (or window queries) are a type of query which specifies a set of allowed values in each dimension. Indexing feature vectors in the multidimensional Nonordered Discrete Data Spaces (NDDS) for efficient box queries are becoming increasingly important in many application domains such as genome sequence databases. Most of the existing work in this field targets the similarity queries (range queries and k-NN queries). Box queries, however, are fundamentally different from similarity queries. Hence, the same indexing schemes designed for similarity queries may not be efficient for box queries. In this paper, we present a new indexing structure specifically designed for box queries in the NDDS. Unique characteristics of the NDDS are exploited to develop new node splitting heuristics. For the BoND-tree, we also provide theoretical analysis to show the optimality of the proposed heuristics. Extensive experiments with synthetic data demonstrate that the proposed scheme is significantly more efficient than the existing ones when applied to support box queries in NDDSs. We also show effectiveness of the proposed scheme in a real-world application of primer design for genome sequence databases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call