Assessing the Representativeness of Open Source Projects in Empirical Software Engineering Studies

Hao Zhong,Jacky Keung,Ye Yang

doi:10.1109/apsec.2012.36

Abstract

BACKGROUND: Software engineering researchers have carried out many empirical studies on open source software (OSS) projects to understand the OSS phenomenon, and to develop better software engineering techniques. Many of these studies typically use only a few successful projects as study subjects. Recently, these studies have received criticisms and challenges on their representativeness on OSS projects. AIM: First, we aim to examine to what extent data extracted from successful projects are different from data extracted from the majority. If data extracted from successful projects are quite different from data extracted from the majority, approaches that are effective on successful projects may not be effective in general. Second, we aim to examine whether successful OSS projects are representative to the whole population of OSS. If they are not, conclusions that are drawn from only successful projects may reflect the OSS phenomenon partially. METHODOLOGY: We analyzed 11, 684 OSS projects that are hosted on Source Forge. When researchers select subjects, they typically select successful projects that are attractive to both users and developers. Considering this preference, we clustered these projects into four categories based their attractiveness to users and developers. Here, we use the K-means clustering technique to produce combined result. Furthermore, we selected eight indicators that are used in many existing studies (e.g., team sizes), and compared indicators that are extracted from different categories to investigate to what degree they are different. RESULT: For the first research aim, the result shows that 66.1% projects are under developing projects, 14.7% projects are user-preference projects, 14.2% projects are developer-preference projects, and only 5.0% projects are considered successful. For the second research aim, the result shows that all the eight analyzed indicators are highly unbalanced with the gamma distribution. Furthermore, the result reveals that users and developers of Source Forge have different perceptions on the development status defined by Source Forge. CONCLUSION: We conclude that successful projects are not representative to the whole population of OSS, and data extracted from successful projects are quite different from data extracted from the majority. The result implies that conclusions drawn from only a few successful projects may be challenged. This work is important to allow researchers to refine conclusions of existing studies, and to better understand and to carefully select OSS project subjects for their future empirical experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Assessing the Representativeness of Open Source Projects in Empirical Software Engineering Studies

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Open Source Software Development: Minitrack Introduction
K Crowston ... H Annabi
-
K Crowston, et. al.K Crowston ... H Annabi
03 Jan 2005
03 Jan 2005

How to characterize the health of an Open Source Software project? A snowball literature review of an emerging practice
Johan Linåker ... Thomas Olsson
-
Johan Linåker, et. al.Johan Linåker ... Thomas Olsson
07 Sep 2022
07 Sep 2022

Exploratory Analysis of Quality Practices in Open Source Domain
Jie Xu ... Luiz Fernando Capretz
Computer and Information Science | VOL. 3
Jie Xu, et. al.Jie Xu ... Luiz Fernando Capretz
20 Oct 2010
Computer and Information Science | VOL. 3

Open source software licenses: Strong-copyleft, non-copyleft, or somewhere in between?
Ravi Sen ... Matthew L Nelson
Decision Support Systems | VOL. 52
Ravi Sen, et. al.Ravi Sen ... Matthew L Nelson
23 Jul 2011
Decision Support Systems | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assessing the Representativeness of Open Source Projects in Empirical Software Engineering Studies

Abstract

Talk to us

Similar Papers