Abstract

BackgroundStatistically reconstructing haplotypes from single nucleotide polymorphism (SNP) genotypes, can lead to falsely classified haplotypes. This can be an issue when interpreting haplotype association results or when selecting subjects with certain haplotypes for subsequent functional studies. It was our aim to quantify haplotype reconstruction error and to provide tools for it.Methods and ResultsBy numerous simulation scenarios, we systematically investigated several error measures, including discrepancy, error rate, and R2, and introduced the sensitivity and specificity to this context. We exemplified several measures in the KORA study, a large population-based study from Southern Germany. We find that the specificity is slightly reduced only for common haplotypes, while the sensitivity was decreased for some, but not all rare haplotypes. The overall error rate was generally increasing with increasing number of loci, increasing minor allele frequency of SNPs, decreasing correlation between the alleles and increasing ambiguity.ConclusionsWe conclude that, with the analytical approach presented here, haplotype-specific error measures can be computed to gain insight into the haplotype uncertainty. This method provides the information, if a specific risk haplotype can be expected to be reconstructed with rather no or high misclassification and thus on the magnitude of expected bias in association estimates. We also illustrate that sensitivity and specificity separate two dimensions of the haplotype reconstruction error, which completely describe the misclassification matrix and thus provide the prerequisite for methods accounting for misclassification.

Highlights

  • Haplotypes have been the subject of considerable attention as they complement the information from the single nucleotide polymorphism (SNP) genotypes

  • We illustrate that sensitivity and specificity separate two dimensions of the haplotype reconstruction error, which completely describe the misclassification matrix and provide the prerequisite for methods accounting for misclassification

  • We propose a classification based on three characteristics: (1) The uncertainty across all haplotypes (1a, ‘‘overall error measure’’), versus the error in a specific haplotype (1b, ‘‘haplotype-specific error measure’’). (2) The uncertainty in a sample statistics (2a, i.e.: haplotype frequencies, fRf*) versus the uncertainty in individuals’ haplotypes (2b)

Read more

Summary

Introduction

Haplotypes have been the subject of considerable attention as they complement the information from the SNP (single nucleotide polymorphism) genotypes. Multilocus haplotypes may capture the LD information in a gene better than methods based on single loci [3]. Haplotypes can provide additional information with respect to association analysis and localization of complex disease genes [5], especially in the presence of multiple susceptibility alleles [6]. Reconstructing haplotypes from single nucleotide polymorphism (SNP) genotypes, can lead to falsely classified haplotypes. This can be an issue when interpreting haplotype association results or when selecting subjects with certain haplotypes for subsequent functional studies. It was our aim to quantify haplotype reconstruction error and to provide tools for it

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call