Abstract

We recently described a method for linkage disequilibrium (LD) mapping, using cladistic analysis of phased single-nucleotide polymorphism (SNP) haplotypes in a logistic regression framework. However, haplotypes are often not available and cannot be deduced with certainty from the unphased genotypes. One possible two-stage approach is to infer the phase of multilocus genotype data and analyze the resulting haplotypes as if known. Here, haplotypes are inferred using the expectation-maximization (EM) algorithm and the best-guess phase assignment for each individual analyzed. However, inferring haplotypes from phase-unknown data is prone to error and this should be taken into account in the subsequent analysis. An alternative approach is to analyze the phase-unknown multilocus genotypes themselves. Here we present a generalization of the method for phase-known haplotype data to the case of unphased SNP genotypes. Our approach is designed for high-density SNP data, so we opted to analyze the simulated dataset. The marker spacing in the initial screen was too large for our method to be effective, so we used the answers provided to request further data in regions around the disease loci and in null regions. Power to detect the disease loci, accuracy in localizing the true site of the locus, and false-positive error rates are reported for the inferred-haplotype and unphased genotype methods. For this data, analyzing inferred haplotypes outperforms analysis of genotypes. As expected, our results suggest that when there is little or no LD between a disease locus and the flanking region, there will be no chance of detecting it unless the disease variant itself is genotyped.

Highlights

  • Disease-marker association studies of samples of unrelated cases and controls have been shown to have the potential to map all but extremely rare variants contributing to complex traits [1]

  • We recently described a method [2] for linkage disequilibrium (LD) mapping, using cladistic analysis of single-nucleotide polymorphism (SNP) haplotypes in a logistic regression framework, which allows straightforward incorporation of covariates

  • We propose in this paper a generalization of our cladistic analysis method for haplotypes to analyze unphased genotypes directly

Read more

Summary

Introduction

Disease-marker association studies of samples of unrelated cases and controls have been shown to have the potential to map all but extremely rare variants contributing to complex traits [1]. A number of statistical approaches have been developed to infer haplotypes and their relative frequencies in a sample and to assign phase to the multilocus genotypes. It is common to employ a two-stage approach of inferring phase and analyzing the 'best' haplotype configuration as if it were known with certainty. The disadvantage of this approach is that we cannot take account of the uncertainty in the phase assignment process. To overcome this problem, we propose in this paper a generalization of our cladistic analysis method for haplotypes to analyze unphased genotypes directly. We use the Genetic Analysis Workshop 14 (GAW14) simulated dataset to compare the analysis of unphased and inferred haplotype analysis in terms of power and accuracy to locate disease loci

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call