Abstract

Achieving research reproducibility is challenging in many ways: there are social and cultural obstacles as well as a constantly changing technical landscape that makes replicating and reproducing research difficult. Users face challenges in reproducing research across different operating systems, in using different versions of software across long projects and among collaborations, and in using publicly available work. The dependencies required to reproduce the computational environments in which research happens can be exceptionally hard to track – in many cases, these dependencies are hidden or nested too deeply to discover, and thus impossible to install on a new machine, which means adoption remains low. In this paper, we present ReproZip , an open source tool to help overcome the technical difficulties involved in preserving and replicating research, applications, databases, software, and more. We will examine the current use cases of ReproZip , ranging from digital humanities to machine learning. We also explore potential library use cases for ReproZip, particularly in digital libraries and archives, liaison librarianship, and other library services. We believe that libraries and archives can leverage ReproZip to deliver more robust reproducibility services, repository services, as well as enhanced discoverability and preservation of research materials, applications, software, and computational environments.

Highlights

  • Reproducibility is at the core of the research process: it is essential for verification and authentication of results, and for driving a field forward

  • Despite the widespread attention drawn to the subject following the Reproducibility Project: Psychology, carried out by the Center for Open Science (Open Science Collaboration, 2015), reproducibility still remains an elusive target for many researchers (Goodman, Fanelli, and Ioannidis 2016)

  • Gronenschild et al (2012) discussed how the results of data analyses in neuroscience performed with the same application differed based on the operating system: We investigated the effects of data processing variables such as FreeSurfer version (v4.3.1, v4.5.0, and v5.0.0), workstation (Macintosh and Hewlett-Packard), and Macintosh operating system version (OS X 10.5 and OS X 10.6)

Read more

Summary

Introduction

Reproducibility is at the core of the research process: it is essential for verification and authentication of results, and for driving a field forward. There may be many unforeseen dependencies for each software or tool, of which different versions from the original configuration may give totally disparate results or not even run To manually address these problems, collectively known as ‘dependency hell,’ researchers enter into an errorprone and resource-heavy process. They would have to create a file that encapsulates metadata about their computational environment, including the operating system, hardware architecture, and software library dependencies. ReproZip packages are highly portable, in that it automatically creates a virtual machine for the user – no extra work required beyond one click or command – allowing research to be reproduced across different operating systems. While ReproZip has primarily been used in research, in this paper, we explore the many ways in which librarians can use ReproZip, from helping user populations create well-managed, reproducible research, to preserving computational environments, and to building library infrastructure

Technical Infrastructure
Packing
Unpacking
Current Use Cases
ReproZip in Librarianship
Digital Libraries
Repository Management
Academic Libraries
Future Development Work
ReproZip-Jupyter
Workflow Visualizations and Graphs
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call