Abstract

Although many programmers write their names in the comments of a source file, from such comments, it is unreliable to identify code authors, since the modifications of many programmers are not recorded. Even if they are recorded in a code repository, many authors are hidden in revision histories.The true authors of source files are important in many research topics. For example, when detecting plagiarism, if the authors of two source are overlapped, it becomes more challenging to determine plagiarism than the source files that are written by individual authors. As it is difficult to determine true authors of a source file, researchers typically use source files whose authors are already known (e.g., the source files from Google Code Jam), but such files are not many and less representative. Meanwhile, although some empirical studies touch code authors, to the best of our knowledge, no prior study has analyzed the characteristics of code authors that are hidden in revision histories. As a result, many research questions along with code authors are still open. For example, how many authors does a source file can have, and what are the proportions of contributions per source file, if they are written by more than one author?To answer the timely questions, in this paper, we conducted an empirical study on code authors that are hidden in revision histories. To support our study, we implemented a tool called CODA. By comparing the latest code lines with past commits, CODA identifies the true authors of all code lines. With its support, we analyzed 12,092 source files that were written by 506 programmers. Our study answers several interesting questions concerning code authors. For example, we find that 75.4% source files are written by multiple authors, and their contributions follow the famous 80/20 principle. These findings are useful to understand authors of source files in open source communities.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call