Abstract

BackgroundIn ecology and forensics, some population assignment techniques use molecular markers to assign individuals to known groups. However, assigning individuals to known populations can be difficult if the level of genetic differentiation among populations is small. Most assignment studies handle independent markers, often by pruning markers in Linkage Disequilibrium (LD), ignoring the information contained in the correlation among markers due to LD.ResultsTo improve the accuracy of population assignment, we present an algorithm, implemented in the HaploPOP software, that combines markers into haplotypes, without requiring independence. The algorithm is based on the Gain of Informativeness for Assignment that provides a measure to decide if a pair of markers should be combined into haplotypes, or not, in order to improve assignment. Because complete exploration of all possible solutions for constructing haplotypes is computationally prohibitive, our approach uses a greedy algorithm based on windows of fixed sizes. We evaluate the performance of HaploPOP to assign individuals to populations using a split-validation approach. We investigate both simulated SNPs data and dense genotype data from individuals from Spain and Portugal.ConclusionsOur results show that constructing haplotypes with HaploPOP can substantially reduce assignment error. The HaploPOP software is freely available as a command-line software at www.ieg.uu.se/Jakobsson/software/HaploPOP/.

Highlights

  • In ecology and forensics, some population assignment techniques use molecular markers to assign individuals to known groups

  • Gain of Informativeness for Assignment (GIA) is defined as the difference between the ancestry information carried by two markers and the ancestry information carried by the haplotypes resulting from the combination of the two markers

  • The Gain of Informativeness for Assignment (GIA) is a one-dimensional statistic that provides a criterion to decide whether markers should be combined into haplotypes in order to improve population assignment [10]

Read more

Summary

Introduction

In ecology and forensics, some population assignment techniques use molecular markers to assign individuals to known groups. Dense datasets tend to contain increasingly correlated markers because Single Nucleotide Polymorphisms (SNPs) that are physically close on a chromosome, often are in Linkage Disequilibrium (LD). Such correlations are usually perceived as a nuisance factor in statistical analyses since it violates a common assumption of Gattepaille and Jakobsson [10] introduced the Gain of Informativeness for Assignment (GIA), which is a statistic measuring the gain in information for population assignment by combining two markers into haplotypes. A major combinatorial challenge arises when using GIA because of the prohibitively large number of pairs of markers that can be combined into haplotypes

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.