Manual program repair is notoriously tedious, error-prone, and costly, especially for the modern large-scale projects. Automated program repair can automatically find program patches without much human intervention, greatly reducing the burden of developers as well as accelerating software delivery. Therefore, much research effort has been dedicated to design powerful program repair techniques. To date, although various program repair techniques have been proposed, to our knowledge, there lacks extensive study on the impacts of repair techniques, subject programs, and test suites on the repair effectiveness and efficiency. In this paper, we perform such an extensive study on repairing 180 seeded and real faults from 17 small to large sized programs. We study the impacts of five representative automated program repair techniques, including GenProg, RSRepair, Brute-force-based technique, AE and Kali, on the repair results. We further investigate the impacts of different subject programs and test suites on effectiveness and efficiency of program repair techniques. Our study demonstrates a number of interesting findings: Brute-force-based technique generates the maximum number of patches but is also the most costly technique, while Kali is the most efficient and has medium effectiveness among the studied techniques; techniques that work well with small programs become too costly or ineffective when applied to large sized programs; since tool-reported patches may overfit the selected test cases, we calculate the false positive rates and find that the influence of failed test cases is much larger than that of passed test cases; finally, surprisingly, all the studied techniques except RSRepair can find more than 80% of successful patches within the first 50% of search space.
Read full abstract