Complete dynamic control flow is a fundamental kind of execution profile about program executions with a wide range of applications. Tracing the dynamic control flow of program executions for a brief period easily generates a trace consisting of billions of control flow events. The number of events in such a trace is large, making both path tracing and path querying to incur significant slowdowns. A major class of path tracing techniques is to design novel trace representations that can be generated efficiently, and encode the inputted sequences of such events so that the inputted sequences are still derivable from the encoded and smaller representations. The control flow semantics in such representations have, however, become obscure, which makes implementing path queries on such a representation inefficient and the design of such queries complicated. We propose a novel two-phase path tracing framework— Hierarchical Program Path (HPP)—to model the complete dynamic control flow of an arbitrary number of executions of a program. In Phase 1, HPP monitors each execution, and efficiently generates a stream of events, namely HPPTree, representing a novel tree-based representation of control flow for each thread of control in the execution. In Phase 2, given a set of such event streams, HPP identifies all the equivalent instances of the same exercised interprocedural path in all the corresponding HPPTree instances, and represents each such equivalent set of paths with a single subgraph, resulting in our compositional graph-based trace representation, namely, HPPDAG. Thus, an HPPDAG instance has the potential to be significantly smaller in size than the corresponding HPPTree instances, and still completely preserves the control flow semantics of the traced executions. Control flow queries over all the traced executions can also be directly performed on the single HPPDAG instance instead of separately processing the trace representation of each execution followed by aggregating their results. We validate HPP using the SPLASH2 and SPECint 2006 benchmarks. Compared to the existing technique, named BLPT (Ball-Larus-based Path Tracing), HPP generates significantly smaller trace representations and incurs fewer slowdowns to the native executions in online tracing of Phase 1. The HPPDAG instances generated in Phase 2 are significantly smaller than their corresponding BLPT and HPPTree traces. We show that HPPDAG supports efficient backtrace querying, which is a representative path query based on complete control flow trace. Finally, we illustrate the ease of use of HPPDAG by building a novel and highly efficient path profiling technique to demonstrate the applicability of HPPDAG.
Read full abstract