Code reviews are central for software quality assurance. Ideally, reviewers should explain their feedback to enable authors of code changes to understand the feedback and act accordingly. Different developers might need different explanations in different contexts. Therefore, assisting this process first requires understanding the types of explanations reviewers usually provide. The goal of this paper is to study the types of explanations used in code reviews and explore the potential of Large Language Models (LLMs), specifically ChatGPT, in generating these specific types. We extracted 793 code review comments from Gerrit and manually labeled them based on whether they contained a suggestion, an explanation, or both. Our analysis shows that 42% of comments only include suggestions without explanations. We categorized the explanations into seven distinct types including rule or principle, similar examples, and future implications. When measuring their prevalence, we observed that some explanations are used differently by novice and experienced reviewers. Our manual evaluation shows that, when the explanation type is specified, ChatGPT can correctly generate the explanation in 88 out of 90 cases. This foundational work highlights the potential for future automation in code reviews, which can assist developers in sharing and obtaining different types of explanations as needed, thereby reducing back-and-forth communication.
Read full abstract