Abstract

Motivation: As genomics moves into the clinic, there has been much interest in using this medical data for research. At the same time the use of such data raises many privacy concerns. These circumstances have led to the development of various methods to perform genome-wide association studies (GWAS) on patient records while ensuring privacy. In particular, there has been growing interest in applying differentially private techniques to this challenge. Unfortunately, up until now all methods for finding high scoring SNPs in a differentially private manner have had major drawbacks in terms of either accuracy or computational efficiency.Results: Here we overcome these limitations with a substantially modified version of the neighbor distance method for performing differentially private GWAS, and thus are able to produce a more viable mechanism. Specifically, we use input perturbation and an adaptive boundary method to overcome accuracy issues. We also design and implement a convex analysis based algorithm to calculate the neighbor distance for each SNP in constant time, overcoming the major computational bottleneck in the neighbor distance method. It is our hope that methods such as ours will pave the way for more widespread use of patient data in biomedical research.Availability and implementation: A python implementation is available at http://groups.csail.mit.edu/cb/DiffPriv/.Contact: bab@csail.mit.eduSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • IntroductionGenome-wide association studies (GWAS) are a cornerstone of genotype–phenotype association in humans

  • Genome-wide association studies (GWAS) are a cornerstone of genotype–phenotype association in humans. These studies use various statistical tests to measure which polymorphisms in the genome are important for a given phenotype and which are not

  • With the increasing collection of genomic data in the clinic, there has been a push towards using this information to validate classical GWAS findings and generate new ones (Weber et al, 2009)

Read more

Summary

Introduction

Genome-wide association studies (GWAS) are a cornerstone of genotype–phenotype association in humans. There is growing concern that the results of these studies might lead to loss of privacy for those who participate in them (Erlich and Narayanan, 2014; Homer et al, 2008; Lumley and Rice, 2010) These privacy concerns have led some to suggest using statistical tests that are differentially private (Jiang et al, 2014; Johnson and Shmatikov, 2013; Tramer et al, 2015; Uhler et al, 2013; Wang et al, 2014; Yu and Ji, 2014; Yu et al, 2014). Recent work has suggested that differentially private methods can be used to help avoid overfitting and related problems that plague much of biomedical science (Dwork et al, 2015) These gains, have traditionally come at a high cost in utility and efficiency. In order to help balance utility and privacy, new methods are needed that provide greater utility than current methods while achieving equal or greater privacy

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call