Abstract

The outsourcing of genomic data into public cloud computing settings raises concerns over privacy and security. Significant advancements in secure computation methods have emerged over the past several years, but such techniques need to be rigorously evaluated for their ability to support the analysis of human genomic data in an efficient and cost-effective manner. With respect to public cloud environments, there are concerns about the inadvertent exposure of human genomic data to unauthorized users. In analyses involving multiple institutions, there is additional concern about data being used beyond agreed research scope and being prcoessed in untrused computational environments, which may not satisfy institutional policies. To systematically investigate these issues, the NIH-funded National Center for Biomedical Computing iDASH (integrating Data for Analysis, ‘anonymization’ and SHaring) hosted the second Critical Assessment of Data Privacy and Protection competition to assess the capacity of cryptographic technologies for protecting computation over human genomes in the cloud and promoting cross-institutional collaboration. Data scientists were challenged to design and engineer practical algorithms for secure outsourcing of genome computation tasks in working software, whereby analyses are performed only on encrypted data. They were also challenged to develop approaches to enable secure collaboration on data from genomic studies generated by multiple organizations (e.g., medical centers) to jointly compute aggregate statistics without sharing individual-level records. The results of the competition indicated that secure computation techniques can enable comparative analysis of human genomes, but greater efficiency (in terms of compute time and memory utilization) are needed before they are sufficiently practical for real world environments.Electronic supplementary materialThe online version of this article (doi:10.1186/s12920-016-0224-3) contains supplementary material, which is available to authorized users.

Highlights

  • Advances in high throughput technologies have made it increasingly affordable to sequence the human genome in various settings, ranging from biomedical research and healthcare

  • Author summary Advancement of technology significantly reduces the price of obtaining whole genome sequencing (WGS) data and makes personalized genome analysis more affordable

  • We present the recent findings of novel genomic data protection methods through a community-wide open competition to address the emerging privacy challenges

Read more

Summary

Introduction

Advances in high throughput technologies have made it increasingly affordable to sequence the human genome in various settings, ranging from biomedical research and healthcare. Massive collection of human genomic data [1], together with the advancement of analysis techniques, may enable more effective clinical diagnosis, as well as the discovery of new treatments. Given such potential, there are numerous initiatives that have been established, the most recent of which is the Precision Medicine Initiative, which will aim at studying the combined genotypes and phenotypes of at least one million volunteers [2]. To take advantage of such environments, it was stated in the NIH Security Best Practices for Controlled-Access Data Subject to the NIH Genomic Data Sharing (GDS) Policy (http://grants.nih.gov/grants/guide/notice-files/ NOT-OD-15-086.html) that the scientists and their institutions are responsible for ensuring the security of human genomic data (i.e., it is not a responsibility of the cloud service provider). Whatever care biomedical researchers might afford to potentially sensitive health information cannot replace the formal procedures required to secure such data and keep it protected

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call