Abstract

Automatic Program Repair (APR) techniques have shown the potential of reducing debugging costs while improving software quality by generating patches for fixing bugs automatically. However, they often generate many overfitting patches which pass only a specific test-suite but do not fix the bugs correctly. This paper proposes MIPI, a novel approach to reducing the number of overfitting patches generated in the APR. We leverage recent advances in deep learning to exploit the similarity between the patched method’s name (which often encloses the developer’s intention about the code) and the semantic meaning of the method’s body (which represents the actual implemented behavior) for identifying and removing overfitting patches generated by APR tools. Experiments with a large dataset of patches for QuixBugs and Defects4J programs show the promise of our approach. Specifically, in a total of 1,191 patches generated by 23 existing APR tools, MIPI successfully filters out 254 (32%) of the total 797 overfitting patches with a precision of 90% while preserving 93% of the correct patches. MIPI is more precise and less damaging to the APR than existing heuristic patch assessment techniques, achieving a higher recall than automated testing-based techniques that do not have access to the test oracle. In addition, MIPI is highly complementary to existing automated patch assessment techniques.

Highlights

  • Software is becoming ubiquitous in every aspect of our daily life, but they often contain bugs

  • Recent studies have shown that a major portion of the plausible patches generated is incorrect, which is known as overfitting patches

  • We propose a novel patch correctness assessment technique that exploits the developer intention embedded in the method name

Read more

Summary

INTRODUCTION

Software is becoming ubiquitous in every aspect of our daily life, but they often contain bugs. APR tools often generate many plausible patches that modify the program at a non-buggy location [33], such patches are probably incorrect even though they are very similar to the original program To alleviate these issues, we need to reflect the intention of the developers behind the original code itself. As different from the similarity-based approaches, our approach uses the developer’s intention enclosed in the meaning of descriptive code elements (e.g., method names), instead of the original code, as the origin coordinate for evaluating the correctness of patches. Code understanding models, such as Code2Vec [39], show impressive results in predicting method names or generating text descriptions for code snippets across different projects Motivated by these successes, we proposed leveraging recent advances in deep learning to automatically identify incorrect patches in APR.

IDENTIFY THE MEANING OF CODE SNIPPETS
PATCH CORRECTNESS CLASSIFIER
DATASET
RESULTS OF RQ1
RESULTS OF RQ2
RQ3: HOW EFFECTIVE IS OUR APPROACH IN IDENTIFYING INCORRECT PATCHES?
RESULT OF RQ4
Method
RESULTS OF RQ5
VIII. CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call