Abstract

BackgroundWe explored the imputation performance of the program IMPUTE in an admixed sample from Mexico City. The following issues were evaluated: (a) the impact of different reference panels (HapMap vs. 1000 Genomes) on imputation; (b) potential differences in imputation performance between single-step vs. two-step (phasing and imputation) approaches; (c) the effect of different INFO score thresholds on imputation performance and (d) imputation performance in common vs. rare markers.MethodsThe sample from Mexico City comprised 1,310 individuals genotyped with the Affymetrix 5.0 array. We randomly masked 5% of the markers directly genotyped on chromosome 12 (n = 1,046) and compared the imputed genotypes with the microarray genotype calls. Imputation was carried out with the program IMPUTE. The concordance rates between the imputed and observed genotypes were used as a measure of imputation accuracy and the proportion of non-missing genotypes as a measure of imputation efficacy.ResultsThe single-step imputation approach produced slightly higher concordance rates than the two-step strategy (99.1% vs. 98.4% when using the HapMap phase II combined panel), but at the expense of a lower proportion of non-missing genotypes (85.5% vs. 90.1%). The 1,000 Genomes reference sample produced similar concordance rates to the HapMap phase II panel (98.4% for both datasets, using the two-step strategy). However, the 1000 Genomes reference sample increased substantially the proportion of non-missing genotypes (94.7% vs. 90.1%). Rare variants (<1%) had lower imputation accuracy and efficacy than common markers.ConclusionsThe program IMPUTE had an excellent imputation performance for common alleles in an admixed sample from Mexico City, which has primarily Native American (62%) and European (33%) contributions. Genotype concordances were higher than 98.4% using all the imputation strategies, in spite of the fact that no Native American samples are present in the HapMap and 1000 Genomes reference panels. The best balance of imputation accuracy and efficiency was obtained with the 1,000 Genomes panel. Rare variants were not captured effectively by any of the available panels, emphasizing the need to be cautious in the interpretation of association results for imputed rare variants.

Highlights

  • We explored the imputation performance of the program IMPUTE in an admixed sample from Mexico City

  • Reference panels for imputation The following reference panels were used for the present study: (a)HapMap phase II combined sample, which includes up to 4 million SNPs typed in 269 individuals belonging to East Asian/European/West African ancestry, (b)HapMap phase II combined sample along with the HapMap phase III Mexican-American LA sample (MXL), which was genotyped for about 1.4 million SNPs, and the (c) 1000 Genomes phase I sample (June 2011 release), which comprises >37 million autosomal SNPs typed in 1,094 individuals from populations around the world

  • The concordance rate was used as a measure of the imputation accuracy and the proportion of non-missing genotypes as a measure of imputation efficacy

Read more

Summary

Introduction

We explored the imputation performance of the program IMPUTE in an admixed sample from Mexico City. To overcome the aforementioned limitations of GWAS genotyping platforms, a variety of imputation methods have been developed. These methods infer missing or untyped SNP genotypes based on the genotypes at nearby typed SNPs, using the pattern of linkage disequilibrium (LD) observed in reference samples. The main challenge of imputation, lies in the selection of an appropriate reference panel relevant for the study populations This is straightforward in samples with ancestry matching that of the available reference panels (e.g., European or East Asian ancestry), this is not the case for samples that are not well represented in the reference panels (e.g. Native American samples or admixed samples). It has been described that this strategy results in good imputation accuracy [16]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call