Datalog Evaluation Research Articles

With its combination of Datalog, SMT solving, and functional programming, the language Formulog provides an appealing mix of features for implementing SMT-based static analyses (e.g., refinement type checking, symbolic execution) in a natural, declarative way. At the same time, the performance of its custom Datalog solver can be an impediment to using Formulog beyond prototyping—a common problem for Datalog variants that aspire to solve large problem instances. In this work we speed up Formulog evaluation, with some surprising results: while 2.2× speedups can be obtained by using the conventional techniques for high-performance Datalog (e.g., compilation, specialized data structures), the big wins come by abandoning the central assumption in modern performant Datalog engines, semi-naive Datalog evaluation. In the place of semi-naive evaluation, we develop eager evaluation, a concurrent Datalog evaluation algorithm that explores the logical inference space via a depth-first traversal order. In practice, eager evaluation leads to an advantageous distribution of Formulog’s SMT workload to external SMT solvers and improved SMT solving times: our eager evaluation extensions to the Formulog interpreter and Soufflé’s code generator achieve mean 5.2× and 7.6× speedups, respectively, over the optimized code generated by off-the-shelf Soufflé on SMT-heavy Formulog benchmarks. All in all, using compilation and eager evaluation (as appropriate), Formulog implementations of refinement type checking, bottom-up pointer analysis, and symbolic execution achieve speedups on 20 out of 23 benchmarks over previously published, hand-tuned analyses written in F ♯ , Java, and C++, providing strong evidence that Formulog can be the basis of a realistic platform for SMT-based static analysis. Moreover, our experience adds nuance to the conventional wisdom that traditional semi-naive evaluation is the one-size-fits-all best Datalog evaluation algorithm for static analysis workloads.

Read full abstract

AbstractWe propose a fundamentally new approach to Datalog evaluation. Given a linear Datalog program DB written usingNconstants and binary predicates, we first translate if-and-only-if completions of clauses in DB into a setEq(DB) of matrix equations with a non-linear operation, where relations inMDB, the least Herbrand model of DB, are encoded as adjacency matrices. We then translateEq(DB) into another, but purely linear matrix equationsẼq(DB). It is proved that the least solution ofẼq(DB) in the sense of matrix ordering is converted to the least solution ofEq(DB) and the latter givesMDBas a set of adjacency matrices. Hence, computing the least solution ofẼq(DB) is equivalent to computingMDBspecified by DB. For a class of tail recursive programs and for some other types of programs, our approach achievesO(N3) time complexity irrespective of the number of variables in a clause since only matrix operations costingO(N3) or less are used. We conducted two experiments that compute the least Herbrand models of linear Datalog programs. The first experiment computes transitive closure of artificial data and real network data taken from the Koblenz Network Collection. The second one compared the proposed approach with the state-of-the-art symbolic systems including two Prolog systems and two ASP systems, in terms of computation time for a transitive closure program and the same generation program. In the experiment, it is observed that our linear algebraic approach runs 101~ 104times faster than the symbolic systems when data is not sparse. Our approach is inspired by the emergence of big knowledge graphs and expected to contribute to the realization of rich and scalable logical inference for knowledge graphs.

Read full abstract

Datalog Evaluation Research Articles

Related Topics

Articles published on Datalog Evaluation

Making Formulog Fast: An Argument for Unconventional Datalog Evaluation

Evaluating Datalog over Semirings: A Grounding-based Approach

Jumping Evaluation of Nested Regular Path Queries

Fast datalog evaluation for batch and stream graph processing

Debugging Large-scale Datalog

Specializing parallel data structures for Datalog

Distribution Policies for Datalog

Scaling-up in-memory datalog processing

A linear algebraic approach to datalog evaluation

Research on semantic of updatable distributed logic and its application in access control

Bottom-Up Evaluation of Datalog: Preliminary Report

Selective provenance for datalog programs using top-k queries

Asynchronous and fault-tolerant recursive datalog evaluation in shared-nothing engines

Cologne

The Monadic Second-order Logic Evaluation Problem on Finite Colored Trees: a Database-theoretic Approach

Querying datalog with arrays: Design and implementation issues

Bottom-up evaluation of datalog with negation

A functional approach to database updates

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Datalog Evaluation Research Articles

Related Topics

Articles published on Datalog Evaluation

Making Formulog Fast: An Argument for Unconventional Datalog Evaluation

Evaluating Datalog over Semirings: A Grounding-based Approach

Jumping Evaluation of Nested Regular Path Queries

Fast datalog evaluation for batch and stream graph processing

Debugging Large-scale Datalog

Specializing parallel data structures for Datalog

Distribution Policies for Datalog

Scaling-up in-memory datalog processing

A linear algebraic approach to datalog evaluation

Research on semantic of updatable distributed logic and its application in access control

Bottom-Up Evaluation of Datalog: Preliminary Report

Selective provenance for datalog programs using top-k queries

Asynchronous and fault-tolerant recursive datalog evaluation in shared-nothing engines

Cologne

The Monadic Second-order Logic Evaluation Problem on Finite Colored Trees: a Database-theoretic Approach

Querying datalog with arrays: Design and implementation issues

Bottom-up evaluation of datalog with negation

A functional approach to database updates