Abstract

To answer the need for the rigorous protection of biomedical data, we organized the Critical Assessment of Data Privacy and Protection initiative as a community effort to evaluate privacy-preserving dissemination techniques for biomedical data. We focused on the challenge of sharing aggregate human genomic data (e.g., allele frequencies) in a way that preserves the privacy of the data donors, without undermining the utility of genome-wide association studies (GWAS) or impeding their dissemination. Specifically, we designed two problems for disseminating the raw data and the analysis outcome, respectively, based on publicly available data from HapMap and from the Personal Genome Project. A total of six teams participated in the challenges. The final results were presented at a workshop of the iDASH (integrating Data for Analysis, 'anonymization,' and SHaring) National Center for Biomedical Computing. We report the results of the challenge and our findings about the current genome privacy protection techniques.

Highlights

  • The biomedical community is evolving to benefit from and to contribute to big data science

  • To investigate the extent to which data perturbation technologies are practical, we organized the Critical Assessment of Data Privacy and Protection (CADPP) Workshop as a community effort to evaluate the effectiveness of these methodologies for genomic data

  • There were still genotypes missing for some Personal Genome Project (PGP) participants, which were imputed based on allele frequencies from the CEU population in HapMap using fastPHASE, a commonly utilized phasing tool [28]

Read more

Summary

Introduction

The biomedical community is evolving to benefit from and to contribute to big data science. To investigate the extent to which data perturbation technologies are practical, we organized the Critical Assessment of Data Privacy and Protection (CADPP) Workshop as a community effort to evaluate the effectiveness of these methodologies (in particular differential privacy approaches [20]) for genomic data This workshop was organized around two specific problems. Teams focused on the challenge of sharing aggregate human genomic data (e.g., allele frequencies) to preserve the privacy of the data donors, without undermining the utility of the data in genome-wide association studies (GWAS). This means that, in practice, the most significant genomic regions identified by a GWAS were preserved after data perturbation. Two companion papers in this issue of the journal [22,23] written by participating teams provide details of the data perturbation techniques that were used in the challenge tasks

Background
Results and discussions
Conclusion
Ohno-Machado L
20. Dwork C
23. Church GM
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.