Haplotype Reconstruction Error as a Classical Misclassification Problem: Introducing Sensitivity and Specificity as Error Measures

Iris M Heid,Claudia Lamina,Helmut Küchenhoff,Vincent Macaulay,Friedhelm Bongardt

doi:10.1371/journal.pone.0001853

Iris M Heid, Claudia Lamina + Show 3 more

Open Access

https://doi.org/10.1371/journal.pone.0001853

Copy DOI

Abstract

BackgroundStatistically reconstructing haplotypes from single nucleotide polymorphism (SNP) genotypes, can lead to falsely classified haplotypes. This can be an issue when interpreting haplotype association results or when selecting subjects with certain haplotypes for subsequent functional studies. It was our aim to quantify haplotype reconstruction error and to provide tools for it.Methods and ResultsBy numerous simulation scenarios, we systematically investigated several error measures, including discrepancy, error rate, and R2, and introduced the sensitivity and specificity to this context. We exemplified several measures in the KORA study, a large population-based study from Southern Germany. We find that the specificity is slightly reduced only for common haplotypes, while the sensitivity was decreased for some, but not all rare haplotypes. The overall error rate was generally increasing with increasing number of loci, increasing minor allele frequency of SNPs, decreasing correlation between the alleles and increasing ambiguity.ConclusionsWe conclude that, with the analytical approach presented here, haplotype-specific error measures can be computed to gain insight into the haplotype uncertainty. This method provides the information, if a specific risk haplotype can be expected to be reconstructed with rather no or high misclassification and thus on the magnitude of expected bias in association estimates. We also illustrate that sensitivity and specificity separate two dimensions of the haplotype reconstruction error, which completely describe the misclassification matrix and thus provide the prerequisite for methods accounting for misclassification.

Highlights

Haplotypes have been the subject of considerable attention as they complement the information from the single nucleotide polymorphism (SNP) genotypes
We illustrate that sensitivity and specificity separate two dimensions of the haplotype reconstruction error, which completely describe the misclassification matrix and provide the prerequisite for methods accounting for misclassification
We propose a classification based on three characteristics: (1) The uncertainty across all haplotypes (1a, ‘‘overall error measure’’), versus the error in a specific haplotype (1b, ‘‘haplotype-specific error measure’’). (2) The uncertainty in a sample statistics (2a, i.e.: haplotype frequencies, fRf*) versus the uncertainty in individuals’ haplotypes (2b)

Summary

Introduction

Haplotypes have been the subject of considerable attention as they complement the information from the SNP (single nucleotide polymorphism) genotypes. Multilocus haplotypes may capture the LD information in a gene better than methods based on single loci [3]. Haplotypes can provide additional information with respect to association analysis and localization of complex disease genes [5], especially in the presence of multiple susceptibility alleles [6]. Reconstructing haplotypes from single nucleotide polymorphism (SNP) genotypes, can lead to falsely classified haplotypes. This can be an issue when interpreting haplotype association results or when selecting subjects with certain haplotypes for subsequent functional studies. It was our aim to quantify haplotype reconstruction error and to provide tools for it

Methods

Results

Discussion

Conclusion