Can we benchmark Code Review studies? A systematic mapping study of methodology, dataset, and metric

Dong Wang,Yuki Ueda,Raula Gaikovina Kula,Takashi Ishio,Kenichi Matsumoto

doi:10.1016/j.jss.2021.111009

Dong Wang, Yuki Ueda + Show 3 more

Open Access

PDF Available

https://doi.org/10.1016/j.jss.2021.111009

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Context:Code Review (CR) is the cornerstone for software quality assurance and a crucial practice for software development. As CR research matures, it can be difficult to keep track of the best practices and state-of-the-art in methodology, dataset, and metric. Objective:This paper investigates the potential of benchmarking by collecting methodology, dataset, and metric of CR studies. Methods:A systematic mapping study was conducted. A total of 112 studies from 19,847 papers published in high-impact venues between the years 2011 and 2019 were selected and analyzed. Results:First, we find that empirical evaluation is the most common methodology (65% of papers), with solution and experience being the least common methodology. Second, we highlight 50% of papers that use the quantitative method or mixed-method have the potential for replicability. Third, we identify 457 metrics that are grouped into sixteen core metric sets, applied to nine Software Engineering topics, showing different research topics tend to use specific metric sets. Conclusion:We conclude that at this stage, we cannot benchmark CR studies. Nevertheless, a common benchmark will facilitate new researchers, including experts from other fields, to innovate new techniques and build on top of already established methodologies. A full replication is available at https://naist-se.github.io/code-review/.

Full Text