Abstract

With the popularity and availability of OSS projects, Software Engineering (SE) researchers have made many advances in understanding how software is developed. However, in SE Research, like in any other scientific field, it is always desirable to produce results, techniques, and tools that can apply to a large (or all if possible) number of software projects. The ideal case would be to randomly select a statistically significant sample of software projects. However, past SE studies evaluate hypotheses on a small sample of deliberately chosen OSS projects that are out there in the world. More recently, an increasing number of SE researchers have started examining their hypotheses on larger datasets, which are deliberately chosen as well. The aim of the large-scale studies is to increase the generality of the research studies. However, generality of results may not be achieved if the sample of projects chosen for evaluation are homogeneous in nature and not diverse with respect to the entire population of SE projects. In this chapter, we present the initial work done on diversity and representativeness in SE research. We first define what we mean by diversity and representativeness in SE research. Then, we present: (a) a way to assess the quality of a given sample of projects with respect to diversity and representativeness and (b) a selection technique that allows one to tailor a sample with high diversity and representativeness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.