Reuse and maintenance practices among divergent forks in three software ecosystems

John Businge,Thorsten Berger,Sarah Nadi,Moses Openja

doi:10.1007/s10664-021-10078-2

Abstract

With the rise of social coding platforms that rely on distributed version control systems, software reuse is also on the rise. Many software developers leverage this reuse by creating variants through forking, to account for different customer needs, markets, or environments. Forked variants then form a so-called software family; they share a common code base and are maintained in parallel by same or different developers. As such, software families can easily arise within software ecosystems, which are large collections of interdependent software components maintained by communities of collaborating contributors. However, little is known about the existence and characteristics of such families within ecosystems, especially about their maintenance practices. Improving our empirical understanding of such families will help build better tools for maintaining and evolving such families. We empirically explore maintenance practices in such fork-based software families within ecosystems of open-source software. Our focus is on three of the largest software ecosystems existence today: Android, .NET, and JavaScript. We identify and analyze software families that are maintained together and that exist both on the official distribution platform (Google play, nuget, and npm) as well as on GitHub , allowing us to analyze reuse practices in depth. We mine and identify 38 software families, 526 software families, and 8,837 software families from the ecosystems of Android, .NET, and JavaScript, to study their characteristics and code-propagation practices. We provide scripts for analyzing code integration within our families. Interestingly, our results show that there is little code integration across the studied software families from the three ecosystems. Our studied families also show that techniques of direct integration using git outside of GitHub is more commonly used than GitHub pull requests. Overall, we hope to raise awareness about the existence of software families within larger ecosystems of software, calling for further research and better tools support to effectively maintain and evolve them.

Highlights

The increased popularity of social-coding platforms such as GitHub made forking a powerful mechanism to clone software repositories for creating new software
The community typically distinguishes between two kinds of forks (Zhou et al 2020): social forks that are created for isolated development with the goal of contributing back to the mainline and divergent forks that are created for splitting off a new development branch, often to steer the development into another direction without intending to contribute back, while leveraging the mainline project that defines or adheres to some standards (Sung et al 2020)
We presented a large-scale exploratory study on reuse and maintenance practices via code propagation between variant forks and their mainline counterparts in software ecosystems

Summary

Introduction

The increased popularity of social-coding platforms such as GitHub made forking a powerful mechanism to clone software repositories for creating new software. While forking allows isolated development and independent evolution of repositories, the traceability allows comparing the revision histories, for instance, to determine whether one repository is ahead of the other (i.e., contains changes not yet integrated in the other) It allows easier commit propagation across the repositories. While a mainline and a forked repository are under no obligation to synchronize any changes, developers commonly propagate their code changes (e.g., new features or bug fixes) among repositories via commit integration (Jiang et al 2017; Openja et al 2020). For tracing such propagation, the metadata provided by GitHub is not always reliable.

Objectives

Methods

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Empirical Software Engineering	Publication Date: Mar 1, 2022
Citations: 14	License type: open-access

R Discovery Prime

R Discovery Prime

Reuse and maintenance practices among divergent forks in three software ecosystems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Empirical Software Engineering

Lead the way for us

Similar Papers

Enforcing Access Control in Distributed Version Control Systems
Xin Xu ... Liangqin Ren
-
Xin Xu, et. al.Xin Xu ... Liangqin Ren
01 Jul 2019
01 Jul 2019

How do centralized and distributed version control systems impact software changes?
Caius Brindescu ... Danny Dig
-
Caius Brindescu, et. al.Caius Brindescu ... Danny Dig
31 May 2014
31 May 2014

Connecting Distributed Version Control Systems communities to linked open data
Khaled Aslan ... Hala Skaf-Molli
-
Khaled Aslan, et. al.Khaled Aslan ... Hala Skaf-Molli
01 Mar 2012
01 Mar 2012

An empirical study of a software reuse reference model
D.C Rine ... N Nada
Information and Software Technology | VOL. 42
D.C Rine, et. al.D.C Rine ... N Nada
06 Dec 1999
Information and Software Technology | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reuse and maintenance practices among divergent forks in three software ecosystems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Empirical Software Engineering