Abstract

Human immunodeficiency virus (HIV) is a rapidly evolving pathogen that causes chronic infections, so genetic diversity within a single infection can be very high. High-throughput "deep" sequencing can now measure this diversity in unprecedented detail, particularly since it can be performed at different time points during an infection, and this offers a potentially powerful way to infer the evolutionary dynamics of the intrahost viral population. However, population genomic inference from HIV sequence data is challenging because of high rates of mutation and recombination, rapid demographic changes, and ongoing selective pressures. In this article we develop a new method for inference using HIV deep sequencing data, using an approach based on importance sampling of ancestral recombination graphs under a multilocus coalescent model. The approach further extends recent progress in the approximation of so-called conditional sampling distributions, a quantity of key interest when approximating coalescent likelihoods. The chief novelties of our method are that it is able to infer rates of recombination and mutation, as well as the effective population size, while handling sampling over different time points and missing data without extra computational difficulty. We apply our method to a data set of HIV-1, in which several hundred sequences were obtained from an infected individual at seven time points over 2 years. We find mutation rate and effective population size estimates to be comparable to those produced by the software BEAST. Additionally, our method is able to produce local recombination rate estimates. The software underlying our method, Coalescenator, is freely available.

Highlights

  • Ross and Rodrigo 2002; Williamson 2003; Edwards et al.2006; Lemey et al 2007; Pybus and Rambaut 2009), and so genetic data from these populations are of medical relevance in addition to providing insight into molecular evolutionary processes

  • In this article we work within a coalescent framework and in particular its extensions to allow for serially sampled, or heterochronous, sequences (Rodrigo and Felsenstein 1999). (This is in contrast to the usual situation of isochronous sampling at a single fixed time.) The coalescent is a powerful and flexible framework for modeling the genealogy of a large, panmictic population, with many further extensions that incorporate changing population size, recombination, and recurrent mutations

  • Parameter estimation in population genetic models requires computation of the likelihood of the observed data D as a function of the model parameters F: We start by describing a method for estimating the likelihood LðFÞ for heterochronous data at two loci, with F 1⁄4 ðNe ; m; rÞ: Our method provides a weighted approximation to the posterior distribution of genealogical histories given these data, so it is straightforward to address questions of ancestral inference, such as the time to the most recent common ancestor (TMRCA) of the data

Read more

Summary

Introduction

Ross and Rodrigo 2002; Williamson 2003; Edwards et al.2006; Lemey et al 2007; Pybus and Rambaut 2009), and so genetic data from these populations are of medical relevance in addition to providing insight into molecular evolutionary processes. For the study of within-host evolution of HIV patients, deep sequencing serves as a potentially powerful way to infer the evolutionary and ecological dynamics of the viral population in unprecedented detail, since sequencing can be performed at different time points during infection (Drummond et al 2003) This is especially important for studying fast-evolving RNA viruses, where the substitution rates and effective population size may change through time. (This is in contrast to the usual situation of isochronous sampling at a single fixed time.) The coalescent is a powerful and flexible framework for modeling the genealogy of a large, panmictic population, with many further extensions that incorporate changing population size, recombination, and recurrent mutations (see Hein et al 2005, for a textbook introduction) It is a crucial component in the inference of the evolutionary dynamics of fast-evolving RNA viruses, which can be combined with epidemiological data in an approach known as phylodynamics (Grenfell et al 2004). A potentially powerful method of inference under complicated coalescent evolutionary models is to proceed by computationally intensive Monte

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call