Abstract

Pathogen sequence data have been exploited to infer who infected whom, by using empirical and model-based approaches. Most of these approaches exploit one pathogen sequence per infected host (e.g. individual, household, field). However, modern sequencing techniques can reveal the polymorphic nature of within-host populations of pathogens. Thus, these techniques provide a subsample of the pathogen variants that were present in the host at the sampling time. Such data are expected to give more insight on epidemiological links than a single sequence per host. In general, a mechanistic viewpoint to transmission and micro-evolution has been followed to infer epidemiological links from these data. Here, we investigate an alternative approach grounded on statistical learning. The idea consists of learning the structure of epidemiological links with a pseudo-evolutionary model applied to training data obtained from contact tracing, for example, and using this initial stage to infer links for the whole dataset. Such an approach has the potential to be particularly valuable in the case of a risk of erroneous mechanistic assumptions, it is sufficiently parsimonious to allow the handling of big datasets in the future, and it is versatile enough to be applied to very different contexts from animal, human and plant epidemiology.This article is part of the theme issue ‘Modelling infectious disease outbreaks in humans, animals and plants: approaches and important themes’. This issue is linked with the subsequent theme issue ‘Modelling infectious disease outbreaks in humans, animals and plants: epidemic forecasting and control’.

Highlights

  • In order to most effectively predict and control the spread of infectious diseases, we need to better understand how pathogens spread within and between host populations and assess the role of the environment in the transmissions

  • We introduced an exploratory approach, called SLAFEEL, for quantitatively investigating epidemiological links between host units from deep sequencing data

  • This versatile approach, grounded on statistical learning, is adaptable to diverse contexts and data. We applied it to analyse virus dynamics in humans, animals and plants at different spatial scales using data obtained with different sequencing techniques and showing different levels of pathogen diversity

Read more

Summary

Introduction

In order to most effectively predict and control the spread of infectious diseases, we need to better understand how pathogens spread within and between host populations and assess the role of the environment in the transmissions. We consider the case where we observe numerous host units infected by an endemic or epidemic infectious disease, and the question of how do pathogens spread? For fast-evolving pathogens, numerous approaches exploiting pathogen sequence data have been developed with the aim of inferring who infected whom or who is closely related to whom. These approaches are grounded on a wide variety of principles, from those based on statistical metrics to those

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call