Algebraic Program Analysis (APA) is a ubiquitous framework that has been employed as a unifying model for various problems in data-flow analysis, termination analysis, invariant generation, predicate abstraction and a wide variety of other standard static analysis tasks. APA models program summaries as elements of a regular algebra . Suppose that a summary in A is assigned to every transition of the program and that we aim to compute the effect of running the program starting at line s and ending at line t . APA first computes a regular expression capturing all program paths of interest. In case of intraprocedural analysis, models all paths from s to t , whereas in the interprocedural case it models all interprocedurally-valid paths, i.e. paths that go back to the right caller function when a callee returns. This regular expression is then interpreted over the algebra to obtain the desired result. Suppose the program has n lines of code and each evaluation of an operation in the regular algebra takes O ( k ) time. It is well-known that a single APA query, or a set of queries with the same starting point s , can be answered in O ( n · α( n ) · k ), where α is the inverse Ackermann function. In this work, we consider an on-demand setting for APA: the program is given in the input and can be preprocessed. The analysis has to then answer a large number of on-line queries, each providing a pair ( s , t ) of program lines which are the start and end point of the query, respectively. The goal is to avoid the significant cost of running a fresh APA instance for each query. Our main contribution is a series of algorithms that, after a lightweight preprocessing of O ( n · lg n · k ), answer each query in O ( k ) time. In other words, our preprocessing has almost the same asymptotic complexity as a single APA query, except for a sub-logarithmic factor, and then every future query is answered instantly, i.e. by a constant number of operations in the algebra. We achieve this remarkable speedup by relying on certain structural sparsity properties of control-flow and call graphs (CFGs and CGs). Specifically, we exploit the fact that control-flow graphs of real-world programs have a tree-like structure and bounded treewidth and nesting depth and that their call graphs have small treedepth in comparison to the size of the program. Finally, we provide experimental results demonstrating the effectiveness and efficiency of our approach and showing that it beats the runtime of classical APA by several orders of magnitude.
Read full abstract