Symbolic execution has become an indispensable technique for software testing and program analysis. However, since several symbolic execution tools are presently available off-the-shelf, there is a need for a practical benchmarking approach. This paper introduces a fresh approach that can help benchmark symbolic execution tools in a fine-grained and efficient manner. The approach evaluates the performance of such tools against known challenges faced by general symbolic execution techniques, e.g., floating-point numbers and symbolic memories. We first survey related papers and systematize the challenges of symbolic execution. We extract 12 distinct challenges from the literature and categorize them into two categories: symbolic-reasoning challenges and path-explosion challenges . Next, we develop a dataset of logic bombs and a framework for benchmarking symbolic execution tools automatically. For each challenge, our dataset contains several logic bombs, each addressing a specific challenging problem. Triggering one or more logic bombs confirms that the symbolic execution tool in question is able to handle the corresponding problem. Real-world experiments with three popular symbolic execution tools, namely, KLEE, angr, and Triton have shown that our approach can reveal the capabilities and limitations of the tools in handling specific issues accurately and efficiently. The benchmarking process generally takes only a few dozens of minutes to evaluate a tool. We have released our dataset on GitHub as open source, with an aim to better facilitate the community to conduct future work on benchmarking symbolic execution tools.
Read full abstract