Abstract
BackgroundThe estimation of individual ancestry from genetic data has become essential to applied population genetics and genetic epidemiology. Software programs for calculating ancestry estimates have become essential tools in the geneticist's analytic arsenal.ResultsHere we describe four enhancements to ADMIXTURE, a high-performance tool for estimating individual ancestries and population allele frequencies from SNP (single nucleotide polymorphism) data. First, ADMIXTURE can be used to estimate the number of underlying populations through cross-validation. Second, individuals of known ancestry can be exploited in supervised learning to yield more precise ancestry estimates. Third, by penalizing small admixture coefficients for each individual, one can encourage model parsimony, often yielding more interpretable results for small datasets or datasets with large numbers of ancestral populations. Finally, by exploiting multiple processors, large datasets can be analyzed even more rapidly.ConclusionsThe enhancements we have described make ADMIXTURE a more accurate, efficient, and versatile tool for ancestry estimation.
Highlights
The estimation of individual ancestry from genetic data has become essential to applied population genetics and genetic epidemiology
The effectiveness of cross-validation Figure 1 demonstrates the effectiveness of cross-validation on several datasets culled from HapMap 3 [10]
While we have not performed extensive simulation studies, our experience has shown that the success of cross-validation depends in part on the degree of differentiation between the populations under study as quantified by Wright’s fixation index FST
Summary
We describe four enhancements to ADMIXTURE, a high-performance tool for estimating individual ancestries and population allele frequencies from SNP (single nucleotide polymorphism) data. ADMIXTURE can be used to estimate the number of underlying populations through cross-validation. Individuals of known ancestry can be exploited in supervised learning to yield more precise ancestry estimates. By penalizing small admixture coefficients for each individual, one can encourage model parsimony, often yielding more interpretable results for small datasets or datasets with large numbers of ancestral populations. By exploiting multiple processors, large datasets can be analyzed even more rapidly
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.