Datalog and Recursive Query Processing

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

In recent years, we have witnessed a revival of the use of recursive queries in a variety of emerging application domains such as data integration and exchange, information extraction, networking, and program analysis. A popular language used for expressing these queries is Datalog. This paper surveys for a general audience the Datalog language, recursive query processing, and optimization techniques. This survey differs from prior surveys written in the eighties and nineties in its comprehensiveness of topics, its coverage of recent developments and applications, and its emphasis on features and techniques beyond classical Datalog which are vital for practical applications. Specifically, the topics covered include the core Datalog language and various extensions, semantics, query optimizations, magic-sets optimizations, incremental view maintenance, aggregates, negation, and types. We conclude the paper with a survey of recent systems and applications that use Datalog and recursive queries.

Similar Papers
  • Conference Article
  • Cite Count Icon 107
  • 10.1145/1989323.1989456
Datalog and emerging applications
  • Jun 12, 2011
  • Shan Shan Huang + 2 more

We are witnessing an exciting revival of interest in recursive Datalog queries in a variety of emerging application domains such as data integration, information extraction, networking, program analysis, security, and cloud computing. This tutorial briefly reviews the Datalog language and recursive query processing and optimization techniques, then discusses applications of Datalog in three application domains: data integration, declarative networking, and program analysis. Throughout the tutorial, we use LogicBlox, a commercial Datalog engine for enterprise software systems, to allow the audience to walk through code examples presented in the tutorial.

  • Conference Article
  • Cite Count Icon 263
  • 10.1145/1142473.1142485
Declarative networking
  • Jun 27, 2006
  • Boon Thau Loo + 8 more

The networking and distributed systems communities have recently explored a variety of new network architectures, both for application-level overlay networks, and as prototypes for a next-generation Internet architecture. In this context, we have investigated declarative networking: the use of a distributed recursive query engine as a powerful vehicle for accelerating innovation in network architectures [23, 24, 33]. Declarative networking represents a significant new application area for database research on recursive query processing. In this paper, we address fundamental database issues in this domain. First, we motivate and formally define the Network Datalog (NDlog) language for declarative network specifications. Second, we introduce and prove correct relaxed versions of the traditional semi-naive query evaluation technique, to overcome fundamental problems of the traditional technique in an asynchronous distributed setting. Third, we consider the dynamics of network state, and formalize the iheventual consistencyl. of our programs even when bursts of updates can arrive in the midst of query execution. Fourth, we present a number of query optimization opportunities that arise in the declarative networking context, including applications of traditional techniques as well as new optimizations. Last, we present evaluation results of the above ideas implemented in our P2 declarative networking system, running on 100 machines over the Emulab network testbed.

  • Conference Article
  • Cite Count Icon 1
  • 10.1145/3341105.3375770
Enhancing recursive graph querying on RDBMS with data clustering approaches
  • Mar 30, 2020
  • Lucas C Scabora + 4 more

Recursive queries are one of the main mechanisms in Relational Database Management Systems to process topology-aware, or graph-like, queries. However, existing works focus only on optimizing the recursive query statements and processing, disregarding the potential physical arrangements that might improve performance. In this work, we propose to use an approach based on adjacent-list storage to physically organize the graph-like data aiming at both reducing the recursive query time and the number of I/O operations. By using Clustered Tables, we tied the adjacency list in chunks for (i) storing both vertex and edge tables together in a Combined Tables approach; and (ii) reordering the edge table with the Edge Clustered Table approach using 20% and 80% of the total adjacency list size. The clustered approaches enabled a faster recursive query processing (up to 22%) and a reduction of up to 61% in the number of page accesses when compared to the Conventional approach. When starting from multiple vertices, the Combined Tables approach achieved a query reduction time of up to 50% in the first join operation, and Edge Clustered Table 20% provided an overall time reduction of up to 20%. The results show that our physical design is effective and allows one to use recursive queries without adaptations.

  • Conference Article
  • Cite Count Icon 22
  • 10.1109/icde.1986.7266260
Some performance results on recursive query processing in relational database systems
  • Feb 1, 1986
  • Jiawei Han + 1 more

The processing of recursive queries in relational database systems poses a great challenge in research on expert database systems. This paper uses both analytical and experimental methods to investigate the performance of several different algorithms in processing a recursive query in first-order recursive databases. The analytical method estimated the I/O and CPU cost and the storage needed in processing recursive queries. The experimental tests were performed on a synthetic relational database built on top of WISS (Wisconsin Storage System) on VAX 11/750. Both analytical and experimental results indicate that for efficient recursive database processing it is important to apply the following heuristics: performing selection first, making use of wavefront relations, and grouping those joins which reduce the size of intermediate results. The termination conditions for recursive queries are also discussed in the paper.

  • Research Article
  • Cite Count Icon 17
  • 10.14778/3311880.3311886
Scaling-up in-memory datalog processing
  • Feb 1, 2019
  • Proceedings of the VLDB Endowment
  • Zhiwei Fan + 5 more

Recursive query processing has experienced a recent resurgence, as a result of its use in many modern application domains, including data integration, graph analytics, security, program analysis, networking and decision making. Due to the large volumes of data being processed, several research efforts across multiple communities have explored how to scale up recursive queries, typically expressed in Datalog. Our experience with these tools indicate that their performance does not translate across domains---e.g., a tool designed for large-scale graph analytics does not exhibit the same performance on program-analysis tasks, and vice versa. Starting from the above observation, we make the following two contributions. First, we perform a detailed experimental evaluation comparing a number of state-of-the-art Datalog systems on a wide spectrum of graph analytics and program-analysis tasks, and summarize the pros and cons of existing techniques. Second, we design and implement our own general-purpose Datalog engine, called RecStep, on top of a parallel single-node relational system. We outline the techniques we applied on RecStep, as well as the contribution of each technique to the overall performance. Using RecStep as a baseline, we demonstrate that it generally out-performs state-of-the-art parallel Datalog engines on complex and large-scale Datalog evaluation, by a 4-6X margin. An additional insight from our work is that it is possible to build a high-performance Datalog system on top of a relational engine, an idea that has been dismissed in past work.

  • Research Article
  • Cite Count Icon 145
  • 10.1016/0304-3975(89)90088-1
Recursive query processing: the power of logic
  • Dec 1, 1989
  • Theoretical Computer Science
  • Laurent Vielle

Recursive query processing: the power of logic

  • Research Article
  • Cite Count Icon 22
  • 10.1016/j.is.2016.04.006
Comparing columnar, row and array DBMSs to process recursive queries on graphs
  • Apr 26, 2016
  • Information Systems
  • Carlos Ordonez + 2 more

Comparing columnar, row and array DBMSs to process recursive queries on graphs

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/3-540-57301-1_15
Reducing page thrashing in recursive query processing
  • Jan 1, 1993
  • Rakesh Agrawal + 1 more

We introduce the problem of page thrashing in the seminaive algorithm for computing recursive queries. We present techniques that take into consideration the system's paging behavior during query computation to reduce this page thrashing. We also propose a buffering strategy based on the Query Locality Set Model that reduces the total memory requirement of a recursive query. We present simulation results that demonstrate the effectiveness of our techniques, both in single and multi-user environments.KeywordsTransitive ClosureConcurrent ProcessPage SizeQuery LocalityPage FaultThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

  • Conference Article
  • Cite Count Icon 16
  • 10.1109/icde.1988.105490
Semantic query optimization in recursive databases
  • Feb 1, 1988
  • S Lee + 1 more

Semantic query optimization is the process of using semantic knowledge expressed in the form of integrity constraints to transform a query into a semantically equivalent one; one that is thought to be less expensive to process. The authors analyze the possibilities of semantic optimization in a deductive database that includes recursive relations and, consequently, integrity constraints that include recursive literals. They propose a compiled approach to utilizing semantic knowledge in recursive query processing, assuming recursive queries are processed using compiled iterative methods. Also, a method of residue propagation for obtaining implied constraints that are often useful in optimization is presented. >

  • Book Chapter
  • 10.1007/3-540-53162-9_43
Recursive query processing in predicate-goal graph
  • Jan 1, 1990
  • Jia Liang Han

The predicate-goal graph (P-G graph) is introduced in this paper to facilitate query processing with linear recursive function-free Horn clauses without negation. A recursive rule is compiled into (an infinite number of) expansions. Each expansion corresponds to a P-G subgraph. The necessary and sufficient condition of existence of an answer to a query is that one P-G subgraph is satisfiable. Two basic recursive query processing strategies, the bottom-up evaluation and the Prolog computation, are illustrated using the P-G graph. In the P-G graph, query evaluation can be independent of the order of the predicates in the rule expression and many query processing strategies may be used. This graphic method can be used for comparisons of various recursive query processing strategies, parallel processing, query optimization, etc.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.15587/1729-4061.2019.180226
A solution for synchronous incremental maintenance of materialized views based on SQL recursive query
  • Oct 9, 2019
  • Eastern-European Journal of Enterprise Technologies
  • Nguyen Tran Quoc Vinh + 5 more

Materialized views are excessively stored query execution results in the database. They can be used to partially or completely answer queries which will be further appeared instead of re-executing query from the scratch. There is a large number of published works that address the maintenance, especially incremental update, of materialized views and query rewriting for using those ones. Some of them support materialized views based on recursive query in datalog language. Although most of datalog queries can be transferred into SQL queries and vise versa but it is not the case for recursive queries. Recursive queries in the data log try to find all possible transitive closures. Recursive queries in SQL (Common Table Expression – CTE) return direct links but not transitive closures. In this paper, we propose efficient methods for incremental update of materialized views based on CTE; and then propose an algorithm for generating source codes in C language for any input SQL recursive queries. The synthesized source codes implement our proposed incremental update algorithms according to inserted/deleted/updated record set in the base tables. This paper focuses mainly on the recursive queries whose execution results are directed tree-structured data. The two cases of tree node are considered. In the first case, a child node has only one parent node and in the second case, a child node can have many parent nodes. Those two cases represent the two types of relationships between entities in real world, that are one–to–many and many–to–many, respectively. For the one–to–many relationships, the relationship data is accompanied with the records describing the child using some fields. Those fields are set as null in deleting a concrete relationship. For the many–to–many relationships, it is stored in a separate table and the concrete relationships are removed by deleting describing records from that table. Considering of enforcing referential integrity may help to reduce the searching space and therefore, help to improve the performance. However, the set of tree nodes or tree edges can be manipulated. All those combinations lead to different algorithms. The experimental results are provided and discussed to confirm the effectiveness of our proposed methods

  • Conference Article
  • Cite Count Icon 19
  • 10.1109/icde.1991.131472
A rule-based query rewriter in an extensible DBMS
  • Jun 4, 2010
  • B Finance + 1 more

An integrated approach to query rewriting in an extensible database server supporting ADTs, objects, deductive capabilities and integrity constraints is described. The approach is extensible through a uniform high level rule language used by the database implementor to specify optimization techniques. This rule language is compiled to enrich the strategy component and the knowledge base of the rewriter. Rules can be added to specify various aspects of query rewriting, including operation permutation, recursive query processing, integrity constraint addition, predicate simplification and method call simplification. >

  • Book Chapter
  • Cite Count Icon 3
  • 10.1007/11547686_2
Usable Recursive Queries
  • Jan 1, 2005
  • Tomasz Pieciukiewicz + 2 more

Recursive queries are required for many tasks of database applications. Among them we can mention Bill-Of-Material (BOM), various kinds of networks (transportation, telecommunication, etc.), processing semi-structured data (XML, RDF), and so on. The support for recursive queries in current query languages is limited. In particular, this concerns corresponding extensions of SQL in Oracle and DB2 systems. In this paper we present recursive query processing capabilities for the object-oriented Stack-Based Query Language (SBQL). SBQL offers very powerful and flexible recursive querying capabilities due to the fact that recursive processing operators are fully orthogonal to other capabilities of this language. The presented features aim at the ease of recursive programming in databases and not at building new theoretical foundations. This paper discusses novel SBQL constructs, such as transitive closures, fixed point equations and recursive procedures/views. Their main advantage is that they are seamlessly integrated with object-oriented facilities, computer environment and databases.KeywordsQuery LanguageTransitive ClosureRecursive ProcedureRecursive ProcessingInfinite LoopThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

  • Research Article
  • Cite Count Icon 12
  • 10.1016/0169-023x(92)90029-b
Algebraic optimization of recursive queries
  • Mar 1, 1992
  • Data & Knowledge Engineering
  • Maurice A.W Houtsma + 1 more

Algebraic optimization of recursive queries

  • Conference Article
  • 10.1109/ride.1992.227414
Chain-based evaluation-a bridge linking recursive and nonrecursive query evaluation
  • Feb 2, 1992
  • Jiawei Han

Many recursive query analysis techniques are qualitative in nature. This contracts sharply with relational query optimization which relies heavily on quantitative analysis. This paper shows that chain-based evaluation facilitates quantitative analysis of recursive queries based on the available chain information, database statistics and other quantitative measurements. Chain-based evaluation not only facilitates binding propagation, constraint pushing and the selection of recursive query evaluation algorithms but also provides precise compile chain forms in relational expressions. Since most recursions in database applications can be compiled into highly regular chain forms, chain-based evaluation is promising at bridging recursive and nonrecursive database query evaluation. >

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.