Robust inference of population size histories from genomic sequencing data.

Gautam Upadhya,Matthias Steinrücken

doi:10.1371/journal.pcbi.1010419

Gautam Upadhya, Matthias Steinrücken

Open Access

https://doi.org/10.1371/journal.pcbi.1010419

Copy DOI

Journal: PLOS Computational Biology	Publication Date: Sep 16, 2022
Citations: 7	License type: CC BY 4.0

Affiliation: University of Chicago

Abstract

Unraveling the complex demographic histories of natural populations is a central problem in population genetics. Understanding past demographic events is of general anthropological interest, but is also an important step in establishing accurate null models when identifying adaptive or disease-associated genetic variation. An important class of tools for inferring past population size changes from genomic sequence data are Coalescent Hidden Markov Models (CHMMs). These models make efficient use of the linkage information in population genomic datasets by using the local genealogies relating sampled individuals as latent states that evolve along the chromosome in an HMM framework. Extending these models to large sample sizes is challenging, since the number of possible latent states increases rapidly.Here, we present our method CHIMP (CHMM History-Inference Maximum-Likelihood Procedure), a novel CHMM method for inferring the size history of a population. It can be applied to large samples (hundreds of haplotypes) and only requires unphased genomes as input. The two implementations of CHIMP that we present here use either the height of the genealogical tree (TMRCA) or the total branch length, respectively, as the latent variable at each position in the genome. The requisite transition and emission probabilities are obtained by numerically solving certain systems of differential equations derived from the ancestral process with recombination. The parameters of the population size history are subsequently inferred using an Expectation-Maximization algorithm. In addition, we implement a composite likelihood scheme to allow the method to scale to large sample sizes.We demonstrate the efficiency and accuracy of our method in a variety of benchmark tests using simulated data and present comparisons to other state-of-the-art methods. Specifically, our implementation using TMRCA as the latent variable shows comparable performance and provides accurate estimates of effective population sizes in intermediate and ancient times. Our method is agnostic to the phasing of the data, which makes it a promising alternative in scenarios where high quality data is not available, and has potential applications for pseudo-haploid data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Robust inference of population size histories from genomic sequencing data.

Abstract

Talk to us

Similar Papers

More From: PLOS Computational Biology

Lead the way for us

Similar Papers

Computing the joint distribution of the total tree length across loci in populations with variable size
Alexey Miroshnikov ... Matthias Steinrücken
Theoretical Population Biology | VOL. 118
Alexey Miroshnikov, et. al.Alexey Miroshnikov ... Matthias Steinrücken
21 Sep 2017
Theoretical Population Biology | VOL. 118

Correction to ‘Historical DNA reveals the demographic history of Atlantic cod ( Gadus morhua ) in medieval and early modern Iceland’
Guðbjörg ÁSta Ólafsdóttir ... Kristen M Westfall
Proceedings of the Royal Society B: Biological Sciences | VOL. 281
Guðbjörg ÁSta Ólafsdóttir, et. al.Guðbjörg ÁSta Ólafsdóttir ... Kristen M Westfall
07 Sep 2014
Proceedings of the Royal Society B: Biological Sciences | VOL. 281

Review of population history reconstruction methods in conservation biology
Azamat A Totikov ... Sergei F Kliver
Ecological genetics | VOL. 21
Azamat A Totikov, et. al.Azamat A Totikov ... Sergei F Kliver
12 May 2023
Ecological genetics | VOL. 21

Conservation Genetics
M.A Beaumont
-
M.A BeaumontM.A Beaumont
22 Jul 2003
22 Jul 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust inference of population size histories from genomic sequencing data.

Abstract

Talk to us

Similar Papers

More From: PLOS Computational Biology