Bounded model checking based debugging solutions search for mutations of program expressions that produce the expected output for a currently failing test. However, the current localization tools are not regression aware : they do not use information from the passing tests in their localization formula. On the other hand, the current repair tools attempt to guarantee regression freedom : when provided with a set of passing tests, they guarantee that none of these tests can break due to the suggested repair patch, thereby constructing a large repair formula. In this paper, we propose regression awareness as a means to improve the quality of localization and to scale repair. To enable regression awareness, we summarize the proof of correctness of each passing test by computing Craig Interpolants over a symbolic encoding of the passing execution, and use these summaries as additional soft constraints while synthesizing altered executions corresponding to failing tests. Intuitively, these additional constraints act as roadblocks, thereby discouraging executions that may damage the proof of a passing test. We use a partial MAXSAT solver to relax the proofs in a systematic way, and use a ranking function that penalizes mutations that damage the existing proofs. We have implemented our algorithms into a tool, TINTIN, that enables regression aware localization and repair. For localizations, our strategy is effective in extracting a superior ranking of suspicious locations: on a set of 52 different versions across 12 different programs spanning three benchmark suites, TINTIN achieves a saving of developer effort by almost 45% (in terms of the locations that must be examined by a developer to reach the ground-truth repair) in the worst case and 27% in the average case over existing techniques. For automated repairs, on our set of benchmarks, TINTIN achieves a 2.3X speedup over existing techniques without sacrificing much on the ranking of the repair patches: the ground-truth repair appears as the topmost suggestion in more than 70% of our benchmarks.
Read full abstract