Abstract

The statistical power of genome-wide association studies (GWASs) is affected by the effective sample size. However, the privacy and security concerns associated with individual-level genotype data pose great challenges for cross-institutional cooperation. The full-process cryptographic solutions are in demand but havenot been covered, especially the essential principal-component analysis (PCA). Here, we present TrustGWAS, a complete solution for secure, large-scale GWAS, recapitulating gold standard results against PLINK without compromising privacy and supporting basic PLINK steps including quality control, linkage disequilibrium pruning, PCA, chi-square test, Cochran-Armitage trend test, covariate-supported logistic regression and linear regression, and their sequential combinations. TrustGWAS leverages pseudorandom number perturbations for PCA and multiparty scheme of multi-key homomorphic encryption for all other modules. TrustGWAS can evaluate 100,000 individuals with 1 million variants and complete QC-LD-PCA-regression workflow within 50 h. We further successfully discover gene loci associated with fasting blood glucose, consistent with the findings of the ChinaMAP project.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call