Unity Is Strength: Collaborative LLM-Based Agents for Code Reviewer Recommendation

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Assigning pull requests to appropriate code reviewers can accelerate the review process and help uncover potential bugs. However, the inherent complexities in pull requests and code reviewers present challenges in making suitable matches between them. Prior studies focus on mining rich semantic information from pull requests or profile information from code reviewers to improve efficiency. These approaches often overlook the intrinsic relationships between pull requests and code reviewers, which can be represented by a combination of multiple factors and strategies, resulting in suboptimal recommendation accuracy.

Similar Papers
  • Conference Article
  • Cite Count Icon 10
  • 10.1145/3463274.3463336
Detection and Elimination of Systematic Labeling Bias in Code Reviewer Recommendation Systems
  • Jun 21, 2021
  • K Ayberk Tecimer + 3 more

Reviewer selection in modern code review is crucial for effective code reviews. Several techniques exist for recommending reviewers appropriate for a given pull request (PR). Most code reviewer recommendation techniques in the literature build and evaluate their models based on datasets collected from real projects using open-source or industrial practices. The techniques invariably presume that these datasets reliably represent the “ground truth.” In the context of a classification problem, ground truth refers to the objectively correct labels of a class used to build models from a dataset or evaluate a model’s performance. In a project dataset used to build a code reviewer recommendation system, the recommended code reviewer picked for a PR is usually assumed to be the best code reviewer for that PR. However, in practice, the recommended code reviewer may not be the best possible code reviewer, or even a qualified one. Recent code reviewer recommendation studies suggest that the datasets used tend to suffer from systematic labeling bias, making the ground truth unreliable. Therefore, models and recommendation systems built on such datasets may perform poorly in real practice. In this study, we introduce a novel approach to automatically detect and eliminate systematic labeling bias in code reviewer recommendation systems. The bias that we remove results from selecting reviewers that do not ensure a permanently successful fix for a bug-related PR. To demonstrate the effectiveness of our approach, we evaluated it on two open-source project datasets —HIVE and QT Creator— and with five code reviewer recommendation techniques —Profile-Based, RSTrace, Naive Bayes, k-NN, and Decision Tree. Our debiasing approach appears promising since it improved the Mean Reciprocal Rank (MRR) of the evaluated techniques up to 26% in the datasets used.

  • Conference Article
  • Cite Count Icon 18
  • 10.1109/esem.2019.8870190
Investigating the Validity of Ground Truth in Code Reviewer Recommendation Studies
  • Sep 1, 2019
  • Emre Dogan + 3 more

Background: Selecting the ideal code reviewer in modern code review is a crucial first step to perform effective code reviews. There are several algorithms proposed in the literature for recommending the ideal code reviewer for a given pull request. The success of these code reviewer recommendation algorithms is measured by comparing the recommended reviewers with the ground truth that is the assigned reviewers selected in real life. However, in practice, the assigned reviewer may not be the ideal reviewer for a given pull request.Aims: In this study, we investigate the validity of ground truth data in code reviewer recommendation studies.Method: By conducting an informal literature review, we compared the reviewer selection heuristics in real life and the algorithms used in recommendation models. We further support our claims by using empirical data from code reviewer recommendation studies.Results: By literature review, and accompanying empirical data, we show that ground truth data used in code reviewer recommendation studies is potentially problematic. This reduces the validity of the code reviewer datasets and the reviewer recommendation studies. Conclusion: We demonstrated the cases where the ground truth in code reviewer recommendation studies are invalid and discussed the potential solutions to address this issue.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/icse-seip55303.2022.9794124
Code Reviewer Recommendation in Tencent: Practice, Challenge, and Direction*
  • May 1, 2022
  • Qiuyuan Chen + 5 more

Code review is essential for assuring system quality in software engineering. Over decades in practice, code review has evolved to be a lightweight tool-based process focusing on code change: the smallest unit of the development cycle, and we refer to it as Modern Code Review (MCR). MCR involves code contributors committing code changes and code reviewers reviewing the assigned code changes. Such a reviewer assigning process is challenged by efficiently finding appropriate reviewers. Recent studies propose automated code reviewer recommendation (CRR) approaches to resolve such challenges. These approaches are often evaluated on open-source projects and obtain promising performance. However, the code reviewer recommendation systems are not widely used on proprietary projects, and most current reviewer selecting practice is still manual or, at best, semi-manual. No previous work systematically evaluated these approaches’ effectiveness and compared each other on proprietary projects in practice. In this paper, we performed a quantitative analysis of typical recommendation approaches on proprietary projects in Tencent. The results show an imperfect performance of these approaches on proprietary projects and reveal practical challenges like the “cold start problem”. To better understand practical challenges, we interviewed practitioners about the expectations of applying reviewer recommendations to a production environment. The interview involves the current systems’ limitations, expected application scenario, and information requirements. Finally, we discuss the implications and the direction of practical code reviewer recommendation tools.

  • Research Article
  • 10.37190/e-inf240108
Automated Code Reviewer Recommendation forPullRequests
  • Jan 1, 2024
  • e-Informatica Software Engineering Journal
  • Mina-Sadat Moosareza + 1 more

With the advent of distributed software development based on pull requests, it is possible to review code changes by a third party before integrating them into the master program in an informal and tool-based process called Modern Code Review (MCR). Effectively performing MCR can facilitate the software evolution phase by reducing post-release defects. MCR allows developers to invite appropriate reviewers to inspect their code once a pull request has been submitted. In many projects, selecting the right reviewer is time-consuming and challenging due to the high requests volume and potential reviewers. Various recommender systems have been proposed in the past that use heuristics, machine learning, or social networks to automatically suggest reviewers. Many previous approaches focus on a narrow set of features of candidate reviewers, including their reviewing expertise, and some have been evaluated on small datasets that do not provide generalizability. Additionally, it is common for them not to meet the desired accuracy, precision, or recall standards. Aim: Our aim is to increase the accuracy of code reviewer recommendations by calculating scores relatively and considering the importance of the recency of activities in an optimal way. Method: Our work presents a heuristic approach that takes into account both candidate reviewers’ expertise in reviewing and committing, as well as their social relations to automatically recommend code reviewers. During the development of the approach, we will examine how each of the reviewers’ features contributes to their suitability to review the new request. Results: We evaluated our algorithm on five open-source projects from GitHub. Results indicate that, based on top-1 accuracy, 3-top accuracy, and mean reciprocal rank, our proposed approach achieves 46%, 75%, and 62% values respectively, outperforming previous related works. Conclusion: These results indicate that combining different features of reviewers, including their expertise level and previous collaboration history, can lead to better code reviewer recommendations, as demonstrated by the achieved improvements over previous related works.

  • Conference Article
  • Cite Count Icon 95
  • 10.1145/2889160.2889244
CoRReCT
  • May 14, 2016
  • Mohammad Masudur Rahman + 2 more

Peer code review locates common coding rule violations and simple logical\nerrors in the early phases of software development, and thus reduces overall\ncost. However, in GitHub, identifying an appropriate code reviewer for a pull\nrequest is a non-trivial task given that reliable information for reviewer\nidentification is often not readily available. In this paper, we propose a code\nreviewer recommendation technique that considers not only the relevant\ncross-project work history (e.g., external library experience) but also the\nexperience of a developer in certain specialized technologies associated with a\npull request for determining her expertise as a potential code reviewer. We\nfirst motivate our technique using an exploratory study with 10 commercial\nprojects and 10 associated libraries external to those projects. Experiments\nusing 17,115 pull requests from 10 commercial projects and six open source\nprojects show that our technique provides 85%--92% recommendation accuracy,\nabout 86% precision and 79%--81% recall in code reviewer recommendation, which\nare highly promising. Comparison with the state-of-the-art technique also\nvalidates the empirical findings and the superiority of our recommendation\ntechnique.\n

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/saner53432.2022.00080
Recommending Code Reviewers for Proprietary Software Projects: A Large Scale Study
  • Mar 1, 2022
  • Dezhen Kong + 5 more

Code review is an important activity in software development, which offers benefits such as improving code quality, reducing defects and distributing knowledge. Tencent, as a giant company, hosts a great number of proprietary software projects that are only open to specific internal developers. Since these proprietary projects receive up to 100,000 of newly submitted code changes per month, it is extremely needed to automatically recommend code reviewers. To this end, we first conduct an empirical study on a large scale of proprietary projects from Tencent, to understand their characteristics and how code reviewer recommendation approaches work on them. Based on the derived findings and implications, we propose a new approach named Camp that recommends reviewers by considering their collaboration and expertise in multiple projects, to fit the context of proprietary software development. The evaluation results show that Camp can achieve higher scores on proprietary projects across most metrics than other state-of-the-art approaches, i.e., Revfinder, CHREV, Tie and Comment Network and produce acceptable performance scores for more projects. In addition, we discuss the possible directions of code reviewer recommendation.

  • Research Article
  • Cite Count Icon 27
  • 10.1016/j.scico.2021.102652
A review of code reviewer recommendation studies: Challenges and future directions
  • Apr 14, 2021
  • Science of Computer Programming
  • H Alperen Çetin + 2 more

A review of code reviewer recommendation studies: Challenges and future directions

  • Research Article
  • Cite Count Icon 2
  • 10.3390/electronics12092113
A Code Reviewer Recommendation Approach Based on Attentive Neighbor Embedding Propagation
  • May 5, 2023
  • Electronics
  • Jiahui Liu + 3 more

Code review as an effective software quality assurance practice has been widely applied in many open-source software communities. However, finding a suitable reviewer for certain codes can be very challenging in open-source communities due to the difficulty of learning the characteristics of reviewers and the code-reviewer interaction sparsity in open-source software communities. To tackle this problem, most previous approaches focus on learning developers’ capabilities and experiences and recommending suitable developers based on their historical interactions. However, such approaches usually suffer from data-sparsity and noise problems, which may reduce the recommendation accuracy. In this paper, we propose an attentive neighbor embedding propagation enhanced code reviewer recommendation framework (termed ANEP). In ANEP, we first construct the reviewer–code interaction graph and learn the semantic representations of the reviewer and code based on the transformer model. Then, we explicitly explore the attentive high-order embedding propagation of reviewers and code and refine the representations along their neighbors. Finally, to evaluate the effectiveness of ANEP, we conduct extensive experiments on four real-world datasets. The experimental results show that ANEP outperforms other state-of-the-art approaches significantly.

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/qrs51102.2020.00069
Is There A "Golden" Rule for Code Reviewer Recommendation? : —An Experimental Evaluation
  • Dec 1, 2020
  • Yuanzhe Hu + 4 more

Peer code review has been proven to be an effective practice for quality assurance, and widely adopted by commercial companies and open source communities as GitHub. However, identifying an appropriate code reviewer for a pull request is a non-trivial task considering the large number of candidate reviewers. Several approaches have been proposed for reviewer recommendation, yet none of them has conducted a complete comparison to explore which one is more effective. This paper aims at conducting an experimental evaluation of the commonly-used and state-of-the-art approaches for code reviewer recommendation. We begin with a systematic review of approaches for code reviewer recommendation, and choose six approaches for experimental evaluation. We then implement these approaches and conduct reviewer recommendation on 12 large-scale open source projects with 53,005 pull requests spanning two years. Results show that there is no golden rule when selecting code reviewer recommendation approaches, and the best approach varies in terms of different evaluation metrics (e.g., Top-5 Accuracy, MRR) and experimental projects. Nevertheless, TIE, which utilizes the textual similarity and file path similarity, is the most promising one. We also explore the sensitivity of these approaches to training data, and compare their time cost. This approach provides new insights and practical guidelines for choosing approaches for reviewer recommendation.

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.infsof.2022.106956
Cleaning ground truth data in software task assignment
  • Sep 1, 2022
  • Information and Software Technology
  • K Ayberk Tecimer + 3 more

Cleaning ground truth data in software task assignment

  • Conference Article
  • Cite Count Icon 231
  • 10.1109/saner.2015.7081824
Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review
  • Mar 1, 2015
  • Patanamon Thongtanunam + 5 more

Software code review is an inspection of a code change by an independent third-party developer in order to identify and fix defects before an integration. Effectively performing code review can improve the overall software quality. In recent years, Modern Code Review (MCR), a lightweight and tool-based code inspection, has been widely adopted in both proprietary and open-source software systems. Finding appropriate code-reviewers in MCR is a necessary step of reviewing a code change. However, little research is known the difficulty of finding code-reviewers in a distributed software development and its impact on reviewing time. In this paper, we investigate the impact of reviews with code-reviewer assignment problem has on reviewing time. We find that reviews with code-reviewer assignment problem take 12 days longer to approve a code change. To help developers find appropriate code-reviewers, we propose RevFinder, a file location-based code-reviewer recommendation approach. We leverage a similarity of previously reviewed file path to recommend an appropriate code-reviewer. The intuition is that files that are located in similar file paths would be managed and reviewed by similar experienced code-reviewers. Through an empirical evaluation on a case study of 42,045 reviews of Android Open Source Project (AOSP), OpenStack, Qt and LibreOffice projects, we find that RevFinder accurately recommended 79% of reviews with a top 10 recommendation. RevFinder also correctly recommended the code-reviewers with a median rank of 4. The overall ranking of RevFinder is 3 times better than that of a baseline approach. We believe that RevFinder could be applied to MCR in order to help developers find appropriate code-reviewers and speed up the overall code review process.

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/icsme55016.2022.00021
Exploring the Notion of Risk in Code Reviewer Recommendation
  • Oct 1, 2022
  • Farshad Kazemi + 2 more

Reviewing code changes allows stakeholders to improve the premise, content, and structure of changes prior to or after integration. However, assigning reviewing tasks to team members is challenging, particularly in large projects. Code reviewer recommendation has been proposed to assist with this challenge. Traditionally, the performance of reviewer recommenders has been derived based on historical data, where better solutions are those that recommend exactly which reviewers actually performed tasks in the past. More recent work expands the goals of recommenders to include mitigating turnover-based knowledge loss and avoiding overburdening the core development team. In this paper, we set out to explore how reviewer recommendation can incorporate the risk of defect proneness. To this end, we propose the Changeset Safety Ratio (CSR) – an evaluation measurement designed to capture the risk of defect proneness. Through an empirical study of three open source projects, we observe that: (1) existing approaches tend to improve one or two quantities of interest, such as core developers workload while degrading others (especially the CSR); (2) Risk Aware Recommender (RAR) – our proposed enhancement to multi-objective reviewer recommendation – achieves a 12.48% increase in expertise of review assignees and a 80% increase in CSR with respect to historical assignees, all while reducing the files at risk of knowledge loss by 19.39% and imposing a negligible 0.93% increase in workload for the core team; and (3) our dynamic method outperforms static and normalization-based tuning methods in adapting RAR to suit risk-averse and balanced risk usage scenarios to a significant degree (Conover's test, α < 0.05; small to large Kendall's W).

  • Conference Article
  • Cite Count Icon 14
  • 10.1145/3131704.3131718
An Empirical Study of Reviewer Recommendation in Pull-based Development Model
  • Sep 23, 2017
  • Cheng Yang + 5 more

Code review is an important process to reduce code defects and improve software quality. However, in social coding communities using the pull-based model, everyone can submit code changes, which increases the required code review efforts. Therefore, there is a great need of knowing the process of code review and analyzing the pre-existing reviewer recommendation algorithms. In this paper, we do an empirical study about the PRs and their reviewers in Rails project. Moreover, we reproduce a popular and effective IR-based code reviewer recommendation algorithm and validate it on our dataset which contains 16,049 PRs. We find that the inactive reviewers are very important to code reviewing process, however, the pre-existing method's recommendation result strongly depends on the activeness of reviewers.

  • Conference Article
  • Cite Count Icon 8
  • 10.1145/3510457.3513035
Code reviewer recommendation in tencent
  • May 21, 2022
  • Qiuyuan Chen + 5 more

Code review is essential for assuring system quality in software engineering. Over decades in practice, code review has evolved to be a lightweight tool-based process focusing on code change: the smallest unit of the development cycle, and we refer to it as Modern Code Review (MCR). MCR involves code contributors committing code changes and code reviewers reviewing the assigned code changes. Such a reviewer assigning process is challenged by efficiently finding appropriate reviewers. Recent studies propose automated code reviewer recommendation (CRR) approaches to resolve such challenges. These approaches are often evaluated on open-source projects and obtain promising performance.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/globecom38437.2019.9014249
TIRR: A Code Reviewer Recommendation Algorithm with Topic Model and Reviewer Influence
  • Dec 1, 2019
  • Zhifang Liao + 5 more

Code review is an important way to improve software quality and ensure project security. Pull Request (PR), as an important method of collaborative code modification in GitHub open source software community platform, is very important to find a suitable code reviewer to improve code modification efficiency for Pull Request submitted by code modifiers. In order to solve this problem, we have proposed a review recommendation algorithm based on Pull Request topic model and reviewer's influence. This algorithm has not only extracted the topic information of PR through Latent Dirichlet Allocation (LDA) method, but also analyzed the professional knowledge influence of reviewers through influence network. What’s more, it has combined the topic information of reviewers to find the appropriate PR reviewers. The experimental results based on GitHub show that the algorithm is more efficient, which can effectively reduce the time of code review and improve the recommendation accuracy.

Save Icon
Up Arrow
Open/Close