A Multi-Step Learning Approach to Assist Code Review

Oussama Ben Sghaier,Houari Sahraoui

doi:10.1109/saner56733.2023.00049

Abstract

Modern code review is a process for early detection and reduction of issues, which assists in ensuring the quality of the source code, detecting anomalies, and identifying potential improvements. However, this is a highly manual activity that requires a lot of resources and time. Recent research has addressed these problems by attempting to entirely automate this task (i.e., generating code reviews). However, we do believe that dismissing the reviewer from this process is not the best option in terms of its optimal functioning, especially considering the high error rates in the proposed approaches. Furthermore, this full automation is still too far to achieve given the complexity of the task that requires human intelligence. In this work, we aim to assist the reviewer in the code review process. We propose an approach for detecting the type of issue and locating parts of the code that need to be revised by developers. In the first phase, we propose a meta-learner that combines a learning-based model and a knowledge-based model to predict the type of issue from the review comment. Then, we use this component to create and label a large dataset composed of quadruplets <original code, review, issue type, revised code>. We use this data set to finetune a pre-trained language model to predict the types of issues (e.g., naming, resource handling, etc.), that need to be addressed in the original code snippet. Furthermore, we fine-tune another pre-trained language model to locate these issues in the source code submitted by developers. We evaluate the performance of our approach using a test set not considered during the training. Our results show that our model accurately locates and predicts the types of issues.

Full Text