Countering reproducibility issues in mathematical models with software engineering techniques: A case study using a one-dimensional mathematical model of the atrioventricular node.

Christopher Schölzel,Alexander Goesmann,Andreas Dominik,Gernot Ernst,Valeria Blesius,Roger A Bannister

doi:10.1371/journal.pone.0254749

Abstract

One should assume that in silico experiments in systems biology are less susceptible to reproducibility issues than their wet-lab counterparts, because they are free from natural biological variations and their environment can be fully controlled. However, recent studies show that only half of the published mathematical models of biological systems can be reproduced without substantial effort. In this article we examine the potential causes for failed or cumbersome reproductions in a case study of a one-dimensional mathematical model of the atrioventricular node, which took us four months to reproduce. The model demonstrates that even otherwise rigorous studies can be hard to reproduce due to missing information, errors in equations and parameters, a lack in available data files, non-executable code, missing or incomplete experiment protocols, and missing rationales behind equations. Many of these issues seem similar to problems that have been solved in software engineering using techniques such as unit testing, regression tests, continuous integration, version control, archival services, and a thorough modular design with extensive documentation. Applying these techniques, we reimplement the examined model using the modeling language Modelica. The resulting workflow is independent of the model and can be translated to SBML, CellML, and other languages. It guarantees methods reproducibility by executing automated tests in a virtual machine on a server that is physically separated from the development environment. Additionally, it facilitates results reproducibility, because the model is more understandable and because the complete model code, experiment protocols, and simulation data are published and can be accessed in the exact version that was used in this article. We found the additional design and documentation effort well justified, even just considering the immediate benefits during development such as easier and faster debugging, increased understandability of equations, and a reduced requirement for looking up details from the literature.

Highlights

Mathematical modeling in systems biology, along with many other fields, is facing a reproducibility crisis [1, 2]
While this work is only a case study of the Inada model, we believe that the issues that we found here and the solutions that we presented can be highly relevant for mathematical modeling in systems biology in general
It might be worthwhile for the systems biology community to consider implementing or using a continuous integration (CI) service with predefined virtual machine images for typical modeling workflows

Summary

Introduction

Mathematical modeling in systems biology, along with many other fields, is facing a reproducibility crisis [1, 2]. We follow the terminology of Goodman et al [6], with the following modeling-specific adaptations: Methods reproducibility is achieved if the same code can be used with the same simulation tools and settings to produce the same results as the original study. Results reproducibility is achieved if the model can be rebuilt in a different language, with a different architectural structure, or simulated with different simulation tools using the same experiment protocol to achieve results that closely match those of the original study. For the most part of this article we will not talk about inferential reproducibility, as our focus lies on model design and not on biological findings

Methods

Results

Discussion

Conclusion