Abstract

Several methods have been proposed to impute genotypes at untyped markers using observed genotypes and genetic data from a reference panel. We used the Genetic Analysis Workshop 16 rheumatoid arthritis case-control dataset to compare the performance of four of these imputation methods: IMPUTE, MACH, PLINK, and fastPHASE. We compared the methods' imputation error rates and performance of association tests using the imputed data, in the context of imputing completely untyped markers as well as imputing missing genotypes to combine two datasets genotyped at different sets of markers. As expected, all methods performed better for single-nucleotide polymorphisms (SNPs) in high linkage disequilibrium with genotyped SNPs. However, MACH and IMPUTE generated lower imputation error rates than fastPHASE and PLINK. Association tests based on allele "dosage" from MACH and tests based on the posterior probabilities from IMPUTE provided results closest to those based on complete data. However, in both situations, none of the imputation-based tests provide the same level of evidence of association as the complete data at SNPs strongly associated with disease.

Highlights

  • Indirect association as a result of linkage disequilibrium (LD) is a key factor in genetic association studies

  • The Genetic Analysis Workshop (GAW) 16 Problem 1 dataset provided by the North American Rheumatoid Arthritis Consortium (NARAC) was used

  • The NARAC data consisted of 868 cases of rheumatoid arthritis (RA) and 1194 controls genotyped on the 550 k Illumina singlenucleotide polymorphisms (SNPs) chip

Read more

Summary

Introduction

Indirect association as a result of linkage disequilibrium (LD) is a key factor in genetic association studies. Because of LD, a disease-susceptibility single-nucleotide polymorphism (SNP) need not be genotyped, as long as it is tagged by a SNP or set of SNPs that are genotyped This concept has been further exploited by the introduction of methods to impute missing genotypes at untyped markers, based on known genotypes at typed markers and information about LD within the region from a reference panel [1,2,3,4]. We compare the performance of several imputation methods when combining two datasets that have been genotyped at different sets of markers or when completely missing (i.e., “untyped”) markers are analyzed. The Genetic Analysis Workshop (GAW) 16 Problem 1 dataset provided by the North American Rheumatoid Arthritis Consortium (NARAC) was used

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call