Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores.

Brooks Paige,Aurélien Bellet,Adrià Gascón,James Bell,Daphne Ezer

doi:10.1089/cmb.2020.0445

Abstract

Some organizations such as 23andMe and the UK Biobank have large genomic databases that they re-use for multiple different genome-wide association studies. Even research studies that compile smaller genomic databases often utilize these databases to investigate many related traits. It is common for the study to report a genetic risk score (GRS) model for each trait within the publication. Here, we show that under some circumstances, these GRS models can be used to recover the genetic variants of individuals in these genomic databases—a reconstruction attack. In particular, if two GRS models are trained by using a largely overlapping set of participants, it is often possible to determine the genotype for each of the individuals who were used to train one GRS model, but not the other. We demonstrate this theoretically and experimentally by analyzing the Cornell Dog Genome database. The accuracy of our reconstruction attack depends on how accurately we can estimate the rate of co-occurrence of pairs of single nucleotide polymorphisms within the private database, so if this aggregate information is ever released, it would drastically reduce the security of a private genomic database. Caution should be applied when using the same database for multiple analysis, especially when a small number of individuals are included or excluded from one part of the study.

Highlights

We demonstrate that the kind of research output that is published from genome-wide association studies (GWAS) has the potential to leak enough information to recover the single nucleotide polymorphisms (SNPs) of individuals in the database, under specific circumstances
We demonstrate a series of reconstruction attacks that enable us to infer the genotypes of individuals in private genomic databases, based on publicly released genetic risk score (GRS)
We demonstrate that private information is leaked when GRS models are published, in the case where two sets of largely overlapping individuals are used for multiple studies

Summary

Introduction

In a survey of genomic privacy experts, the long-term privacy of genomic information was deemed both the most important and the most challenging problem While much of the research focus on long-term privacy of genomic databases rests on the longevity of the encryption scheme [7], it is important to remember that these genomic databases are not just sitting on a server somewhere, but are being continually utilised for making new scientific discoveries Each time these databases are accessed and the scientific results are published, there is a risk that information will be leaked and that eventually this would enable an attacker to reconstruct private information held in the database

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Computational Biology	Publication Date: Jan 5, 2021
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computational Biology

Lead the way for us

Similar Papers

Evaluating the effect of multiple genetic risk score models on colorectal cancer risk prediction
Junyi Xin ... Meilin Wang
Gene | VOL. 673
Junyi Xin, et. al.Junyi Xin ... Meilin Wang
14 Jun 2018
Gene | VOL. 673

Klotho locus, metabolic traits, and serum hemoglobin in hospitalized older patients: a genetic association analysis
Giulia Paroni ... Filomena Addante
AGE | VOL. 34
Giulia Paroni, et. al.Giulia Paroni ... Filomena Addante
22 Jun 2011
AGE | VOL. 34

Comparative analysis of genetic risk scores for predicting biochemical recurrence in prostate cancer patients after radical prostatectomy
Ai-Ru Hsieh ... Tzu-Chieh Chou
BMC Urology | VOL. 24
Ai-Ru Hsieh, et. al.Ai-Ru Hsieh ... Tzu-Chieh Chou
02 Jul 2024
BMC Urology | VOL. 24

Prediction of Fetal Hemoglobin in Sickle Cell Anemia Using a Genetic Risk Score
Jacqueline N Milton ... Martin H Steinberg
Blood | VOL. 120
Jacqueline N Milton, et. al.Jacqueline N Milton ... Martin H Steinberg
16 Nov 2012
Blood | VOL. 120

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computational Biology