Abstract

Code authorship is a key information about large-scale software projects. Among others, it reveals the division of work, key collaborators, and developers' profiles. Seeking to better understand authorship in large and successful open source communities, we take the Linux kernel as our first case study. In total, we analyze authorship across 66 stable releases. Our analysis is centered around the Degree-of-Authorship (DOA) metric, which accounts for first authorship events (file creation), as well as further code changes. Authorship along the Linux kernel evolution reveals that (a) only a small portion of developers (26%) makes significant contributions to the code base; this ratio is almost constant during the Linux kernel evolution; (b) the number of files per author is highly skewed—a small group of top-authors (2%) is responsible for hundreds of files, while most authors (75%) are responsible for at most 10 files; (c) most authors in Linux (76%) are specialists and the relation between specialists and generalists tends to be constant; (d) authors with a high number of co-authorship connections tend to work with authors with fewer connections. Furthermore, we replicate the study in an extended dataset, composed of 118 well-known GitHub projects. We identify that most of the authorship patterns observed in the Linux kernel are also common to other open source projects. • Only a small portion of the Linux kernel developers (26%) are file's authors. • The distribution of the number of files per author is highly skewed in the Linux kernel. • Specialization is a common practice in the Linux kernel development over the years. • Co-authorship patterns suggest some sort of mentorship in the Linux development. • Similar authorship patterns are also common in other popular open source systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call