Abstract

Most sequenced genomes are currently stored in strict access-controlled repositories1–3. Free access to these data could improve the power of genome-wide association studies (GWAS) to identify disease-causing genetic variants and may aid in the discovery of new drug targets4,5. However, concerns over genetic data privacy6–9 may deter individuals from contributing their genomes to scientific studies10 and in many cases, prevent researchers from sharing data with the scientific community11. Although several cryptographic techniques for secure data analysis exist12–14, none scales to computationally intensive analyses, such as GWAS. Here we describe an end-to-end protocol for large-scale genome-wide analysis that facilitates quality control and population stratification correction in 9K, 13K, and 23K individuals while maintaining the confidentiality of underlying genotypes and phenotypes. We show the protocol could feasibly scale to a million individuals. This approach may help to make currently restricted data available to the scientific community and could potentially enable ‘secure genome crowdsourcing,’ allowing individuals to contribute their genomes to a study without compromising their privacy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call