Automated Program Repair (APR) is a technique that can automatically fix software defects without manual debugging, playing a crucial role in software development and maintenance. However, the patches generated by APR still suffer from the problem of overfitting, which poses a significant threat to practical applications. Previous studies have proposed various approaches to predict the correctness of the patch to address this issue, primarily including static-based methods, dynamic-based methods, and learning-based methods. However, these methods all have their own limitations, making it difficult to accurately extract code semantic features and achieve comprehensive prediction. To address the aforementioned challenges, we propose a learning-based unsupervised classification model, Automated Patch cOrrectness aSsessmenT based on muLtiple pErspectives (APOSTLE), for predicting the correctness of patches. Specifically, APOSTLE consists of three components: code vectorization component, where advanced pre-trained models are used to achieve efficient extraction of semantic features from the code; similarity and code change degree calculation component, where APOSTLE calculates the similarity and the degree of change to the code; comprehensive evaluation component, where APOSTLE conducts comprehensive evaluation, addressing the issue of prediction comprehensiveness. Experiments on a collection of 1278 patches (written by developers or generated by 32 APR tools) demonstrate that APOSTLE achieves an AUC value of 0.801, an MAP value of 0.855, and an MRR value of 0.944, outperforming the state-of-the-art approach BATS by 8.3%, 6.0%, and 9.0%, respectively. APOSTLE successfully achieves accurate extraction of code semantic features while achieving comprehensive patch correctness prediction.
Read full abstract