Abstract
Context: GitHub, nowadays the most popular social coding platform, has become the reference for mining Open Source repositories, a growing research trend aiming at learning from previous software projects to improve the development of new ones. In the last years, a considerable amount of research papers have been published reporting findings based on data mined from GitHub. As the community continues to deepen in its understanding of software engineering thanks to the analysis performed on this platform, we believe that it is worthwhile to reflect on how research papers have addressed the task of mining GitHub and what findings they have reported. Objective: The main objective of this paper is to identify the quantity, topic, and empirical methods of research works, targeting the analysis of how software development practices are influenced by the use of a distributed social coding platform like GitHub. Method: A systematic mapping study was conducted with four research questions and assessed 80 publications from 2009 to 2016. Results: Most works focused on the interaction around coding-related tasks and project communities. We also identified some concerns about how reliable were these results based on the fact that, overall, papers used small data sets and poor sampling techniques, employed a scarce variety of methodologies and/or were hard to replicate. Conclusions: This paper attested the high activity of research work around the field of Open Source collaboration, especially in the software domain, revealed a set of shortcomings and proposed some actions to mitigate them. We hope that this paper can also create the basis for additional studies on other collaborative activities (like book writing for instance) that are also moving to GitHub.
Highlights
Software forges are web-based collaborative platforms providing tools to ease distributed development, especially useful for Open Source Software (OSS) development
We present a systematic mapping study of all papers reporting findings that rely on the analysis and mining of software repositories in GitHub
RQ1: WHAT TOPICS/AREAS HAVE BEEN ADDRESSED? we summarize the main findings reported by the selected works, grouped in topics and areas of research interest to get a better overview of what the contributions of those papers are and what we can learn from them
Summary
Software forges are web-based collaborative platforms providing tools to ease distributed development, especially useful for Open Source Software (OSS) development. GitHub represents the newest generation of software forges, since it combines the traditional capabilities offered by such systems (e.g., free hosting capabilities or version control system) with social features [1]. At the heart of GitHub is Git [2], a decentralized version control system that manages and stores revisions of projects based on master-less peer-to-peer replication where any replica of a given project can send or receive any information to or from any other replica. Despite the close relation with Git, GitHub comes with many of its own features specially aimed at facilitating the collaboration and social interactions around projects (e.g., issue-tracker, pull request support, watching and following mechanisms, etc.). The platform provides access to its hosted projects’ metadata, available through the GitHub API, facilitating further analysis
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have