DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts.

Donghyung Lee,Vladimir I Vladimirov,Silviu-Alin Bacanu,Vernell S Williamson,Brien P Riley,T Bernard Bigdeli,Ayman H Fanous

doi:10.1093/bioinformatics/btv348

Abstract

Motivation: To increase the signal resolution for large-scale meta-analyses of genome-wide association studies, genotypes at unmeasured single nucleotide polymorphisms (SNPs) are commonly imputed using large multi-ethnic reference panels. However, the ever increasing size and ethnic diversity of both reference panels and cohorts makes genotype imputation computationally challenging for moderately sized computer clusters. Moreover, genotype imputation requires subject-level genetic data, which unlike summary statistics provided by virtually all studies, is not publicly available. While there are much less demanding methods which avoid the genotype imputation step by directly imputing SNP statistics, e.g. Directly Imputing summary STatistics (DIST) proposed by our group, their implicit assumptions make them applicable only to ethnically homogeneous cohorts.Results: To decrease computational and access requirements for the analysis of cosmopolitan cohorts, we propose DISTMIX, which extends DIST capabilities to the analysis of mixed ethnicity cohorts. The method uses a relevant reference panel to directly impute unmeasured SNP statistics based only on statistics at measured SNPs and estimated/user-specified ethnic proportions. Simulations show that the proposed method adequately controls the Type I error rates. The 1000 Genomes panel imputation of summary statistics from the ethnically diverse Psychiatric Genetic Consortium Schizophrenia Phase 2 suggests that, when compared to genotype imputation methods, DISTMIX offers comparable imputation accuracy for only a fraction of computational resources.Availability and implementation: DISTMIX software, its reference population data, and usage examples are publicly available at http://code.google.com/p/distmix.Contact: dlee4@vcu.eduSupplementary information: Supplementary Data are available at Bioinformatics online.

Highlights

Genotype imputation methods (Browning and Browning, 2007; Howie et al, 2009; Li et al, 2010; Nicolae, 2006; Servin and Stephens, 2007) are commonly used to increase the genomic resolution for large-scale multi-ethnic meta-analyses (Ripke et al, 2014; Sklar et al, 2011; Sullivan et al, 2013) by predicting genotypes at VC The Author 2015
We extend DIST imputation method/software to Directly Imputing summary STatistics for unmeasured single nucleotide polymorphisms (SNPs) from MIXed ethnicity cohorts (DISTMIX)
The accuracy of the weight estimates is remarkable, the standard deviation (SD) for any of these estimates falling below 0.2%

Summary

Introduction

Genotype imputation methods (Browning and Browning, 2007; Howie et al, 2009; Li et al, 2010; Nicolae, 2006; Servin and Stephens, 2007) are commonly used to increase the genomic resolution for large-scale multi-ethnic meta-analyses (Ripke et al, 2014; Sklar et al, 2011; Sullivan et al, 2013) by predicting genotypes at VC The Author 2015. For large consortium meta-analyses [e.g. Psychiatric Genetic Consortium Schizophrenia Phase 2 (PGC SCZ2) (Ripke et al, 2014) and Genetic Investigation of ANthropometric Traits (Allen et al, 2010)], (multiple iterations of) genotype imputation can be extremely burdensome computationally. Unlike freely available summary statistics, there is a limited (or, at least, not timely) access to genotypic data that is required by genotype imputation methods. This, in turn, might slow the process of scientific discovery

Methods

Results

Conclusion