Abstract

BackgroundMany common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Detailed predictions of the full phylogenies are therefore of value in improving our ability to make further inferences about population history and sources of genetic variation. Making phylogenetic predictions on the scale needed for whole-genome analysis is, however, extremely computationally demanding.ResultsIn order to facilitate phylogeny-based predictions on a genomic scale, we develop a library of maximum parsimony phylogenies within local regions spanning all autosomal human chromosomes based on Haplotype Map variation data. We demonstrate the utility of this library for population genetic inferences by examining a tree statistic we call 'imperfection,' which measures the reuse of variant sites within a phylogeny. This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history.ConclusionRecent theoretical advances in algorithms for phylogenetic tree reconstruction have made it possible to perform large-scale inferences of local maximum parsimony phylogenies from single nucleotide polymorphism (SNP) data. As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history. This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.

Highlights

  • Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences

  • An inference of population substructure may compare whether observed single nucleotide polymorphism (SNP) allele frequencies in the current generation are more consistent with what we expect to find in a single population under Hardy-Weinberg equilibrium or what we expect to see in two genetically isolated populations evolving independently over many generations

  • We illustrate the use of this library for phylogeny-based inferences with sample applications based on a tree statistic that we call "phylogenetic imperfection." We demonstrate that imperfection shows significant regional and cross-population conservation and show that it is significantly predictive of finescale recombination rate

Read more

Summary

Introduction

Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Examples include methods for identifying sites of frequent recombination (e.g, [4]) or gene conversion (e.g., [5]), identifying conserved haplotype sequences (e.g., [6]), finding genomic regions that have undergone selective sweeps (e.g., [7]), and detecting population substructure (e.g., [8,9]) and admixture (e.g., [10,11]) All of these inference methods work by a common principle of superimposing a mathematical model of the evolutionary event or process to be detected on a model of neutral evolution in the absence of that process. Information on these past genetic sequences, commonly encoded in phylogenetic trees or networks, is not generally directly observable but it too can be computationally inferred

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.