Abstract

Analysis of data and computational modelling is central to most scientific disciplines. The underlying computer programs are complex and costly to design. However, these computational techniques are rarely checked during review of the corresponding papers, nor shared upon publication. Instead, the primary method for sharing data and computer programs today is for authors to state "data available upon reasonable request", although the actual code and data is the only sufficiently detailed description of a computational workflow that allows reproduction and reuse. Despite best intentions, these programs and data can quickly disappear from laboratories. Furthermore, there is a reluctance to share: only 8% of papers in recent top-tier AI conferences shared code relating to their publications (Gundersen et al. 2018). This low-rate of code sharing is seen in other fields, e.g. computational physics (Stodden et al. 2018). Given that code and data are rich digital artefacts that can be shared relatively easily, and that funders and journal publishers increasingly mandate sharing of resources, we should be sharing more and follow best practices for data and software publication. The permanent archival of valuable code and datasets would allow other researchers to make use of these resources in their work, and improve the reliability of reporting as well as the quality of tools.
 We are building a computational platform, called CODECHECK (http://www.codecheck.org.uk), to enhance the availability, discovery and reproducibility of published computational research. Researchers that provide code and data will have their code independently run to ensure the computational parts of a workflow can be reproduced. The results from our independent run will then be shared freely post-publication in an open repository. The reproduction is attributed to the person perfoming the check. Our independent runs will act as a "certificate of reproducible computation". These certificates will be of use to several parties at different times during the generation of a scientific publication.
 
 Prior to peer review, the researchers themselves can check that their code runs on a separate platform.
 During peer review, editors and reviewers can check if the figures in the certificate match those presented in manuscripts for review without cumbersome download and installation procedures.
 Once published, any interested reader can download the software and even data that was used to generate the results shown in the certificate.
 
 The code and results from papers are shared according to the principles we recently outlined (Eglen et al. 2017). To ensure our system scales to large numbers of papers and is trustworthy, our system will be as automated as possible, fully open itself, and rely on open source software and open scholarly infrastructure. This presentation will discuss the challenges faced to date in building the system and in connecting it with existing peer-review principles, and plans for links with open access journals.
 Acknolwedgements
 This work has been funded by the UK Software Sustainability Institute, a Mozilla Open Science Mini grant and the German Research Foundation (DFG) under project number PE 1632/17-1.

Highlights

  • Buckheit & Donoho (1995) The problem is that most modern science is so complicated, and most journal articles so brief, it’s impossible for the article to include details of many important methods and decisions made by the researcher Marwick (2015)

  • Certificates and snapshot of data/code/outputs deposited on Zenodo by Codechecker

  • Next steps1. How to wrap up meta data of certificate and artifacts such that they are useful and reusable. 2. Embedding into journal workflows. 3. Training a community of codecheckers. 4. Generate portfolio of examples. For more information please see: http://codecheck.org.uk

Read more

Summary

Why share code?

An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. Buckheit & Donoho (1995) The problem is that most modern science is so complicated, and most journal articles so brief, it’s impossible for the article to include details of many important methods and decisions made by the researcher Marwick (2015)

The CODECHECK philosophy
Who bene ts?
Limitations
Next steps
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call