Abstract

Research in Mining Software Repositories (MSR) is research involving human subjects, as the repositories usually contain data about developers’ and users’ interactions with the repositories and with each other. The ethics issues raised by such research therefore need to be considered before beginning. This paper presents a discussion of ethics issues that can arise in MSR research, using the mining challenges from the years 2006 to 2021 as a case study to identify the kinds of data used. On the basis of contemporary research ethics frameworks we discuss ethics challenges that may be encountered in creating and using repositories and associated datasets. We also report some results from a small community survey of approaches to ethics in MSR research. In addition, we present four case studies illustrating typical ethics issues one encounters in projects and how ethics considerations can shape projects before they commence. Based on our experience, we present some guidelines and practices that can help in considering potential ethics issues and reducing risks.

Highlights

  • There have been a large number of papers that report the mining of data contained in software repositories, i.e. software data such as source control systems, defect tracking systems, code review repositories, archived communications between project personnel, Communicated by: Georgios Gousios and Sarah Nadi This article belongs to the Topical Collection: Mining Software Repositories (MSR)

  • We present an extended version of our initial case study on ethics issues in MSR mining challenges (Gold and Krinke 2020b) in which we focussed on the mining challenges in the years 2010–2019

  • To illustrate issues involved in direct participation research, as a final case study we describe the preparation and administration of the survey that we report in this paper

Read more

Summary

Introduction

There have been a large number of papers that report the mining of data contained in software repositories, i.e. software data such as source control systems, defect tracking systems, code review repositories, archived communications between project personnel, Communicated by: Georgios Gousios and Sarah Nadi This article belongs to the Topical Collection: Mining Software Repositories (MSR) 17 Page 2 of 49Empir Software Eng (2022) 27:17 question-and-answer sites, continuous integration servers, etc. A software repository contains considerable information about the authors of code as a by-product of their interaction with it, and with their collaborators. A software repository contains considerable information about the authors of code as a by-product of their interaction with it, and with their collaborators. In studying this data, the researcher is in effect directly or indirectly studying the person through their data. Oezbek (2008) identified that open-source software research (including data mining) involves humans as participants, collaborators, or data sources and requires ethics consideration. Much research focuses on open-source software repositories and one could assume that, as the software is published open-source, no ethics issues will arise (similar to studying published literature). As repository data is not typically licensed like this, repository-focused studies are far more likely to raise ethics issues than code-focused studies

Objectives
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call