Space-Time Tradeoffs for Conjunctive Queries with Access Patterns
In this article, we investigate space-time tradeoffs for answering conjunctive queries with access patterns (CQAPs). The goal is to create a space-efficient data structure in an initial preprocessing phase and use it for answering (multiple) queries in an online phase. Previous work has developed data structures that trades off space usage for answering time for queries of practical interest, such as the path and triangle query. However, these approaches lack a comprehensive framework and are not generalizable. Our main contribution is a general algorithmic framework for obtaining space-time tradeoffs for any CQAP. Our framework builds upon the PANDA algorithm and tree decomposition techniques. We demonstrate that our framework captures all state-of-the-art tradeoffs that were independently produced for various queries. Furthermore, we show surprising improvements over the state-of-the-art tradeoffs known in the existing literature for reachability queries.
36
- 10.1145/3035918.3035949
- May 9, 2017
106
- 10.1007/978-3-540-74915-8_18
- Dec 8, 2007
142
- 10.1145/2535926
- Nov 1, 2013
- Journal of the ACM
85
- 10.1145/3034786.3056105
- May 9, 2017
91
- 10.1145/3034786.3034789
- May 9, 2017
12
- 10.1016/j.tcs.2015.03.026
- Mar 20, 2015
- Theoretical Computer Science
178
- 10.1145/2902251.2902280
- Jun 15, 2016
19
- 10.1017/cbo9781139177801.002
- Feb 6, 2014
78
- 10.1109/focs.2010.83
- Oct 1, 2010
178
- 10.4086/toc.2010.v006a005
- Jan 1, 2010
- Theory of Computing
- Conference Article
2
- 10.1145/3584372.3588675
- Jun 18, 2023
In this paper, we investigate space-time tradeoffs for answering conjunctive queries with access patterns (CQAPs). The goal is to create a space-efficient data structure in an initial preprocessing phase and use it for answering (multiple) queries in an online phase. Previous work has developed data structures that trades off space usage for answering time for queries of practical interest, such as the path and triangle query. However, these approaches lack a comprehensive framework and are not generalizable. Our main contribution is a general algorithmic framework for obtaining space-time tradeoffs for any CQAP. Our framework builds upon the $\PANDA$ algorithm and tree decomposition techniques. We demonstrate that our framework captures all state-of-the-art tradeoffs that were independently produced for various queries. Further, we show surprising improvements over the state-of-the-art tradeoffs known in the existing literature for reachability queries.
- Book Chapter
37
- 10.1007/3-540-44503-x_15
- Jan 1, 2001
Abstract. In information-integration systems, source relations often have limitations on access patterns to their data; i.e., when one must provide values for certain attributes of a relation in order to retrieve its tuples. In this paper we consider the following fundamental problem: can we compute the complete answer to a query by accessing the relations with legal patterns? The complete answer to a query is the answer that we could compute if we could retrieve all the tuples from the relations. We give algorithms for solving the problem for various classes of queries, including conjunctive queries, unions of conjunctive queries, and conjunctive queries with arithmetic comparisons. We prove the problem is undecidable for datalog queries. If the complete answer to a query cannot be computed, we often need to compute its maximal answer. The second problem we study is, given two conjunctive queries on relations with limited access patterns, how to test whether the maximal answer to the first query is contained in the maximal answer to the second one? We show this problem is decidable using the results of monadic programs.
- Conference Article
14
- 10.1109/icde.2019.00052
- Apr 1, 2019
This paper studies the deletion propagation problem in terms of minimizing view side-effect. It is a problem funda-mental to data lineage and quality management which could be a key step in analyzing view propagation and repairing data. The investigated problem is a variant of the standard deletion propagation problem, where given a source database D, a set of key preserving conjunctive queries Q, and the set of views V obtained by the queries in Q, we try to identify a set T of tuples from D whose elimination prevents all the tuples in a given set of deletions on views △V while preserving any other results. The complexity of this problem has been well studied for the case with only a single query. Dichotomies, even trichotomies, for different settings are developed. However, no results on multiple queries are given which is a more realistic case. We study the complexity and approximations of optimizing the side-effect on the views, i.e., find T to minimize the additional damage on V after removing all the tuples of △V. We focus on the class of key-preserving conjunctive queries which is a dichotomy for the single query case. It is surprising to find that except the single query case, this problem is NP-hard to approximate within any constant even for a non-trivial set of multiple project-free conjunctive queries in terms of view side-effect. The proposed algorithm shows that it can be approximated within a bound depending on the number of tuples of both V and △V. We identify a class of polynomial tractable inputs, and provide a dynamic programming algorithm to solve the problem. Besides data lineage, study on this problem could also provide important foundations for the computational issues in data repairing. Furthermore, we introduce some related applications of this problem, especially for query feedback based data cleaning.
- Book Chapter
451
- 10.1007/3-540-45571-x_47
- Jan 1, 2000
With the explosive growth of data available on the World Wide Web, discovery and analysis of useful information from the World Wide Web becomes a practical necessity. Web access pattern, which is the sequence of accesses pursued by users frequently, is a kind of interesting and useful knowledge in practice. In this paper, we study the problem of mining access patterns from Web logs efficiently. A novel data structure, called Web access pattern tree, or WAP-tree in short, is developed for efficient mining of access patterns from pieces of logs. The Web access pattern tree stores highly compressed, critical information for access pattern mining and facilitates the development of novel algorithms for mining access patterns in large set of log pieces. Our algorithm can find access patterns from Web logs quite efficiently. The experimental and performance studies show that our method is in general an order of magnitude faster than conventional methods.
- Conference Article
30
- 10.1109/infcom.2001.916640
- Apr 22, 2001
The problem of fast address lookup is crucial to routing and thus has received considerable attention. Most of the work in this field has focused on improving the speed of individual accesses-independent from the underlying access pattern. Gupta et al. (2000) proposed an efficient data structure to exploit the bias in access pattern. This technique achieves faster lookups for more frequently accessed keys while bounding the worst case lookup time; in fact it is (near) optimal under constraints on worst case performance. However,it needs to be rebuilt periodically to reflect the changes in access patterns, which can be inefficient for bursty environments. In this paper we introduce a new dynamic data structure to exploit biases in the access pattern, which tend to change dynamically. Previous work shows that there are many circumstances under which access patterns change quickly. Our data structure, which we call the biased skip list (BSL), has a self-update mechanism which reflects the changes in the access patterns efficiently and immediately, without any need for rebuilding. It improves throughput while keeping the worst case access time bounded by that of the fastest (unbiased) schemes. We demonstrate the practicality of BSL by experiments on data with varying degrees of burstiness.
- Research Article
- 10.46298/lmcs-21(2:23)2025
- Jun 16, 2025
- Logical Methods in Computer Science
We study the problem of answering conjunctive queries with free access patterns (CQAPs) under updates. A free access pattern is a partition of the free variables of the query into input and output. The query returns tuples over the output variables given a tuple of values over the input variables. We introduce a fully dynamic evaluation approach that works for all CQAPs and is optimal for two classes of CQAPs. This approach recovers prior work on the dynamic evaluation of conjunctive queries without access patterns. We first give a syntactic characterisation of all CQAPs that admit constant time per single-tuple update and whose output tuples can be enumerated with constant delay given a tuple of values over the input variables. We further chart the complexity trade-off between the preprocessing time, update time and enumeration delay for a class of CQAPs. For some of these CQAPs, our approach achieves optimal, albeit non-constant, update time and delay. This optimality is predicated on the Online Matrix-Vector Multiplication conjecture. We finally adapt our approach to the dynamic evaluation of tractable CQAPs over probabilistic databases under updates.
- Research Article
76
- 10.1007/s00778-002-0085-6
- Oct 1, 2003
- The VLDB Journal The International Journal on Very Large Data Bases
Abstract.In data applications such as information integration, there can be limited access patterns to relations, i.e., binding patterns require values to be specified for certain attributes in order to retrieve data from a relation. As a consequence, we cannot retrieve all tuples from these relations. In this article we study the problem of computing the complete answer to a query, i.e., the answer that could be computed if all the tuples could be retrieved. A query is stable if for any instance of the relations in the query, its complete answer can be computed using the access patterns permitted by the relations. We study the problem of testing stability of various classes of queries, including conjunctive queries, unions of conjunctive queries, and conjunctive queries with arithmetic comparisons. We give algorithms and complexity results for these classes of queries. We show that stability of datalog programs is undecidable, and give a sufficient condition for stability of datalog queries. Finally, we study data-dependent computability of the complete answer to a nonstable query, and propose a decision tree for guiding the process to compute the complete answer.
- Book Chapter
12
- 10.1007/3-540-44808-x_18
- Jan 1, 2001
Dynamic tables that support search, insert and delete operations are fundamental and well studied in computer science. There are many well known data structures that solve this problem, including balanced binary trees, skip lists and tries among others. Many of the existing data structures work efficiently when the access patterns are uniform, but in many circumstance access patterns are biased. Various data structures have been proposed that exploit bias in access patterns to improve efficiency for the operations they support.In this paper we introduce a new data structure, the biased skip list (BSL), which is designed to work with biased access distributions. Specifically, given key k, let its rank r(k) be the number of distinct keys accessed since the last access to k. BSL enables one to search for k in O(logr(k)) expected time. Insertions and deletions take O(logr max (k)) expected time where r max (k) denotes the maximum rank of k during its lifespan.Our work is motivated by recent studies on packet filtering and classification where keys have been found to have geometric (or more skewed) access probabilities as a function of how recently they have been accessed. We demonstrate the practicality of BSL with experiments on real and synthetic data with various degrees of bias.KeywordsSearch TimeAverage RankAccess PatternMaximum RankExpected TimeThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
- Book Chapter
4
- 10.1007/978-3-642-04930-9_10
- Jan 1, 2009
Scalable query answering over Description Logic (DL) based ontologies plays an important role for the success of the Semantic Web. Towards tackling the scalability problem, we propose a decomposition-based approach to optimizing existing OWL DL reasoners in evaluating conjunctive queries in OWL DL ontologies. The main idea is to decompose a given OWL DL ontology into a set of target ontologies without duplicated ABox axioms so that the evaluation of a given conjunctive query can be separately performed in every target ontology by applying existing OWL DL reasoners. This approach guarantees sound and complete results for the category of conjunctive queries that the applied OWL DL reasoner correctly evaluates. Experimental results on large benchmark ontologies and benchmark queries show that the proposed approach can significantly improve scalability and efficiency in evaluating general conjunctive queries.
- Book Chapter
7
- 10.1007/3-540-40992-0_18
- Jan 1, 2000
A conjunctive query problem in relational database theory is a problem to determine whether or not a tuple belongs to the answer of a conjunctive query over a database. Here, a tuple and a conjunctive query are regarded as a ground atom and a nonrecursive function-free definite clause, respectively. While the conjunctive query problem is NP-complete in general, it becomes efficiently solvable if a conjunctive query is acyclic. Concerned with this problem, we investigate the learnability of acyclic conjunctive queries from an instance with a j-database which is a finite set of ground unit clauses containing at most j-ary predicate symbols. We deal with two kinds of instances, a simple instance as a set of ground atoms and an extended instance as a set of pairs of a ground atom and a description. Then, we show that, for each j ≥ 3, there exist a j-database such that acyclic conjunctive queries are not polynomially predictable from an extended instance under the cryptographic assumptions. Also we show that, for each n > 0 and a polynomial p, there exists a p(n)- database of size O(2p(n)) such that predicting Boolean formulae of size p(n) over n variables reduces to predicting acyclic conjunctive queries from a simple instance. This result implies that, if we can ignore the size of a database, then acyclic conjunctive queries are not polynomially predictable from a simple instance under the cryptographic assumptions. Finally, we show that, if either j = 1, or j = 2 and the number of element of a database is at most l (≥ 0), then acyclic conjunctive queries are paclearnable from a simple instance with j-databases.
- Conference Article
23
- 10.1145/3196959.3196979
- May 27, 2018
Relational queries, and in particular join queries, often generate large output results when executed over a huge dataset. In such cases, it is often infeasible to store the whole materialized output if we plan to reuse it further down a data processing pipeline. Motivated by this problem, we study the construction of space-efficient compressed representations of the output of conjunctive queries, with the goal of supporting the efficient access of the intermediate compressed result for a given access pattern. In particular, we initiate the study of an important tradeoff: minimizing the space necessary to store the compressed result, versus minimizing the answer time and delay for an access request over the result. Our main contribution is a novel parameterized data structure, which can be tuned to trade off space for answer time. The tradeoff allows us to control the space requirement of the data structure precisely, and depends both on the structure of the query and the access pattern. We show how we can use the data structure in conjunction with query decomposition techniques in order to efficiently represent the outputs for several classes of conjunctive queries.
- Book Chapter
8
- 10.1007/978-3-319-78102-0_3
- Jan 1, 2018
Standard approaches for inference in probabilistic formalisms with first-order constructs include lifted variable elimination (LVE) for single queries. To handle multiple queries efficiently, the lifted junction tree algorithm (LJT) uses a first-order cluster representation of a knowledge base and LVE in its computations. We extend LJT with a full formal specification of its algorithm steps incorporating (i) the lifting tool of counting and (ii) answering of conjunctive queries. Given multiple queries, e.g., in machine learning applications, our approach enables us to compute answers faster than the current LJT and existing approaches tailored for single queries.
- Conference Article
91
- 10.1145/3034786.3034789
- May 9, 2017
We consider the task of enumerating and counting answers to k-ary conjunctive queries against relational databases that may be updated by inserting or deleting tuples. We exhibit a new notion of q-hierarchical conjunctive queries and show that these can be maintained efficiently in the following sense. During a linear time pre-processing phase, we can build a data structure that enables constant delay enumeration of the query results; and when the database is updated, we can update the data structure and restart the enumeration phase within constant time. For the special case of self-join free conjunctive queries we obtain a dichotomy: if a query is not q-hierarchical, then query enumeration with sublinear *) delay and sublinear update time (and arbitrary preprocessing time) is impossible.For answering Boolean conjunctive queries and for the more general problem of counting the number of solutions of k-ary queries we obtain complete dichotomies: if the query's homomorphic core is q-hierarchical, then size of the the query result can be computed in linear time and maintained with constant update time. Otherwise, the size of the query result cannot be maintained with sublinear update time.All our lower bounds rely on the OMv-conjecture, a conjecture on the hardness of online matrix-vector multiplication that has recently emerged in the field of fine-grained complexity to characterise the hardness of dynamic problems. The lower bound for the counting problem additionally relies on the orthogonal vectors conjecture, which in turn is implied by the strong exponential time hypothesis.*) By sublinear we mean O(n(1-e) for some e > 0, where n is the size of the active domain of the current database.
- Research Article
1
- 10.1016/j.tcs.2005.09.006
- Sep 28, 2005
- Theoretical Computer Science
Prediction-hardness of acyclic conjunctive queries
- Research Article
2
- 10.5194/isprsarchives-xl-4-133-2014
- Apr 23, 2014
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. Cache replacement strategy is the core for a distributed high-speed caching system, and effects the cache hit rate and utilization of a limited cache space directly. Many reports show that there are temporal and spatial local changes in access patterns of geospatial data, and there are popular hot spots which change over time. Therefore, the key issue for cache replacement strategy for geospatial data is to get a combination method which considers both temporal local changes and spatial local changes in access patterns, and balance the relationship between the changes. And the cache replacement strategy should fit the distribution and changes of hotspot. This paper proposes a cache replacement strategy based on access pattern which have access spatiotemporal localities. Firstly, the strategy builds a method to express the access frequency and the time interval for geospatial data access based on a least-recently-used replacement (LRU) algorithm and its data structure; secondly, considering both the spatial correlation between geospatial data access and the caching location for geospatial data, it builds access sequences based on a LRU stack, which reflect the spatiotemporal locality changes in access pattern. Finally, for achieving the aim of balancing the temporal locality and spatial locality changes in access patterns, the strategy chooses the replacement objects based on the length of access sequences and the cost of caching resource consumption. Experimental results reveal that the proposed cache replacement strategy is able to improve the cache hit rate while achieving a good response performance and higher system throughput. Therefore, it can be applied to handle the intensity of networked GISs data access requests in a cloud-based environment.
- New
- Research Article
- 10.1145/3771733
- Nov 6, 2025
- ACM Transactions on Database Systems
- New
- Research Article
- 10.1145/3774753
- Nov 4, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3774316
- Nov 1, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3716378
- Oct 25, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3771766
- Oct 14, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3770577
- Oct 2, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3734517
- Sep 30, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3760773
- Sep 29, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3764583
- Sep 29, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3743130
- Jul 26, 2025
- ACM Transactions on Database Systems
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.