Abstract
The ability to infer the parameters of positive selection from genomic data has many important implications, from identifying drug-resistance mutations in viruses to increasing crop yield by genetically integrating favorable alleles. Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes has remained elusive. Here, we use simulation to explore the utility of the joint site frequency spectrum to estimate selection and demography simultaneously, including developing an extension of the previously proposed Jaatha program (Mathew et al., 2013). We evaluate both complete and incomplete selective sweeps under an isolation-with-migration model with and without population size change (both population growth and bottlenecks). Results suggest that while it may not be possible to precisely estimate the strength of selection, it is possible to infer the presence of selection while estimating accurate demographic parameters. We further demonstrate that the common assumption of selective neutrality when estimating demographic models may lead to severe biases. Finally, we apply the approach we have developed to better characterize the within-host demographic and selective history of human cytomegalovirus (HCMV) infection using published next generation sequencing data.
Highlights
Identifying the action of selection using genomic polymorphism data has been a long sought after goal of population genetics, and several computational methods have been proposed
While it has long been appreciated that demographic perturbations may result in patterns of variation that are similar to those produced under positive selection, and should be taken into account when identifying selected regions (e.g., Robertson, 1975; Andolfatto and Przeworski, 2000; Teshima et al, 2006; Thornton and Jensen, 2007; Siol et al, 2010; Jensen, 2014), it has been demonstrated that the assumption of an equilibrium population history may bias selection inference (e.g., Jensen et al, 2005)
Under the incomplete sweep scenario we find that the demographic parameters can be accurately estimated, but not the selection strength α (Figure 3), though low and high α values appear to be distinguishable
Summary
Identifying the action of selection using genomic polymorphism data has been a long sought after goal of population genetics, and several computational methods have been proposed. One of the most widely used is that of Kim and Stephan (2002), who utilized a composite-likelihood-ratio to empirically test models of neutrality against positive selection, a framework on which a number of subsequent methods have been built (e.g., Kim and Nielsen, 2004; Jensen et al, 2005) These approaches assume that the population is at equilibrium, and forgo any understanding of the underlying demographic history of the population. Crisci et al (2013) recently evaluated several proposed background site frequency spectrum based approaches [including Sweepfinder (Nielsen et al, 2005), Sweed (Pavlidis et al, 2013), OmegaPlus (Alachiotis et al, 2012), and iHS (Voight et al, 2006)] Though they demonstrated the linkage disequilibrium based approaches to perform better, they described a high false positive rate and low true positive rate under a great variety of models – most notably those including severe bottlenecks
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.