Analysis of COVID‐19 Mathematical and Software Models: Or How NOT to Set Up a Softward Project

Daniel J Duffy

doi:10.1002/wilm.10890

Abstract

This report discusses the open-source software that implements the COVID-19 model in [1] (announced March 16, 2020), led by Dr. Neil Ferguson of Imperial College London (ICL). My personal interest was to investigate how the model was implemented after having seen it described as a system of ordinary differential equations (ODEs) on BBC News on March 17, 2020. Anecdotal evidence suggests that the original program (written in C) is at least 20 years old and it is undocumented (nothing new in the software world; the programmers in this case probably thought it was not necessary to write readable and maintainable software). Furthermore, all the 15,000 lines of code were in a single file (sometimes called balls of mud). On April 22, 2020 a modified version (called 0.7.0, seemingly produced by Microsoft) appeared consisting of approximately 12 separate source files. This is the version of the program that we review in this article. We do not investigate the fixes that have been made between April 22 and the time of writing of this report. Finally, we note that version 0.7.0 is not an implementation of the ODEs that were announced on BBC News with great aplomb on the evening of Saint Patrick's Day 2020. For a discussion of ODEs in epidemiology, see [2]. Before I discovered that the ICL model was not ODE-based (seemingly contradicting the BBC announcement) I solved the MSEIR (iMmune, Susceptible, Exposed, Infective, Recovered) ODE model numerically in C++. The system of equations is relatively benign and we used the C++ Boost odeint library to solve them. (A word of advice: it is tempting to use the Euler method but don't use it, not even for producing cute S-curves in your blogs.) Summarizing, this report is a critique of the quality of 0.7.0 code based on my experience, background, and how I view software. I was unable to analyze the underlying mathematical model because it is not documented and the code is unreadable. However, I do have something to say about the (lack of) quality of code, random number generation (RNG), and (univariate) statistical distributions as these are of central importance in computational finance, an area in which I am involved (see [3], [4], [5], [6]). This critique is based on incomplete information, namely the bespoke open-source C code. I was unable to find further relevant documentation. Nonetheless, a robust review is still possible. Finally, this report has wider applicability than just the current software system. It pinpoints some of the things that can (and do) go badly wrong in software projects. I will distribute this report to my clients and students. I have been preaching the same “doctrine” for more than 30 years and now is a special opportunity to use it to shine a light to expose the shortcomings of such a high-profile project. It will be interesting to follow developments in the coming years. One reason for writing this report was to counter some of the technically superficial and somewhat partisan blogs written by what we flippantly call nameless internet warriors (see [17]). It is never clear to me if there are ulterior or political motives behind these blogs. They are irrelevant here. On the other hand, these kinds of blogs may provide more information on some structural problems in software projects (see [18] for some horror stories).

Full Text