In collaborative software development, programmers create branches for simultaneous program editing, and merge branches to integrate edits. When branches divergently edit the same text, the edits conflict and cannot get co-applied. Tools were built to automatically merge software branches, to detect conflicts, and to resolve conflicts along the way. However, there is no third-party benchmark or metric to comprehensively evaluate or compare those tools.This paper presents ConflictBench, our novel benchmark to evaluate software merge tools. ConflictBench consists of 180 merging scenarios extracted from 180 open-source Java projects. For each scenario, we sampled a conflicting chunk (i.e., conflict) reported by git-merge. Because git-merge sometimes wrongly reports conflicts, with our manual inspection, we labeled 136 of the 180 chunks as true conflicts, and 44 chunks as false conflicts. To facilitate tool evaluation, we also defined a systematic method of manual analysis toanalyze all program versions involved in each merging scenario, and to summarize the root causes as well as developers’ resolution strategies. We further defined three novel metrics to evaluate merge tools. By applying five state-of-the-art tools to ConflictBench, we observed that ConflictBench is effective to characterize different tools. It helps reveal limitations of existing tools and sheds light on future research.
Read full abstract