Abstract

Estimating temporal changes in a target population from phylogenetic or count data is an important problem in ecology and epidemiology. Reliable estimates can provide key insights into the climatic and biological drivers influencing the diversity or structure of that population and evidence hypotheses concerning its future growth or decline. In infectious disease applications, the individuals infected across an epidemic form the target population. The renewal model estimates the effective reproduction number, R, of the epidemic from counts of observed incident cases. The skyline model infers the effective population size, N, underlying a phylogeny of sequences sampled from that epidemic. Practically, R measures ongoing epidemic growth while N informs on historical caseload. While both models solve distinct problems, the reliability of their estimates depends on p-dimensional piecewise-constant functions. If p is misspecified, the model might underfit significant changes or overfit noise and promote a spurious understanding of the epidemic, which might misguide intervention policies or misinform forecasts. Surprisingly, no transparent yet principled approach for optimizing p exists. Usually, p is heuristically set, or obscurely controlled via complex algorithms. We present a computable and interpretable p-selection method based on the minimum description length (MDL) formalism of information theory. Unlike many standard model selection techniques, MDL accounts for the additional statistical complexity induced by how parameters interact. As a result, our method optimizes p so that R and N estimates properly and meaningfully adapt to available data. It also outperforms comparable Akaike and Bayesian information criteria on several classification problems, given minimal knowledge of the parameter space, and exposes statistical similarities among renewal, skyline, and other models in biology. Rigorous and interpretable model selection is necessary if trustworthy and justifiable conclusions are to be drawn from piecewise models. [Coalescent processes; epidemiology; information theory; model selection; phylodynamics; renewal models; skyline plots]

Highlights

  • R PARAG AND DONNELLY the additional statistical complexity induced by how parameters interact

  • Sampled phylogenies and incidence curves are two related but distinct types of empirical data that inform about the population dynamics and ecology of infectious epidemics

  • Over a range of selection problems, that the Fisher information approximation (FIA) generally outperforms the Akaike information criterion (AIC) and Bayesian information criterion (BIC), emphasising the importance of including parametric complexity

Read more

Summary

Introduction

R PARAG AND DONNELLY the additional statistical complexity induced by how parameters interact. Our method optimises p so that R and N estimates properly and meaningfully adapt to available data It outperforms comparable Akaike and Bayesian information criteria on several classification problems, given minimal knowledge of the parameter space, and exposes statistical similarities among renewal, skyline and other models in biology. ADAPTIVE RENEWAL AND SKYLINE MODEL SELECTION curves chart the number of new infecteds observed longitudinally across the epidemic (Wallinga and Teunis, 2004). They provide insight into the ongoing rate of spread of that epidemic, by enabling the inference of its effective reproduction number.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call