Abstract

The rapid progress in human-genome sequencing is leading to a high availability of genomic data. These data is notoriously very sensitive and stable in time, and highly correlated among relatives. In this article, we study the implications of these familial correlations on kin genomic privacy. We formalize the problem and detail efficient reconstruction attacks based on graphical models and belief propagation. With our approach, an attacker can infer the genomes of the relatives of an individual whose genome or phenotype are observed by notably relying on Mendel’s Laws, statistical relationships between the genomic variants, and between the genome and the phenotype. We evaluate the effect of these dependencies on privacy with respect to the amount of observed variants and the relatives sharing them. We also study how the algorithmic performance evolves when we take these various relationships into account. Furthermore, to quantify the level of genomic privacy as a result of the proposed inference attack, we discuss possible definitions of genomic privacy metrics, and compare their values and evolution. Genomic data reveals Mendelian disorders and the likelihood of developing severe diseases, such as Alzheimer’s. We also introduce the quantification of health privacy , specifically, the measure of how well the predisposition to a disease is concealed from an attacker. We evaluate our approach on actual genomic data from a pedigree and show the threat extent by combining data gathered from a genome-sharing website as well as an online social network.

Highlights

  • Thanks to the plummeting costs of molecular profiling, biomedical researchers have access to an increasing amount of genomic data, a key enabler toward a more personalized, precise, and predictive medicine

  • —We update the results of the inference attack by conducting several new experiments. —We thoroughly evaluate the relation between various metrics, and draw conclusions about the most appropriate metric in different settings. —We carry out new experiments by making use of phenotypic information disclosed by OpenSNP users in combination with their genomic data. —We include a performance evaluation, and a discussion about the potential improvements of the proposed inference attacks

  • We first evaluate the performance of the proposed inference attack, compare the entropy-based metrics with respect to the expected estimation error, and evaluate the accuracy of the inference attack with and without considering the Linkage disequilibrium (LD) between single nucleotide polymorphism (SNP)

Read more

Summary

Introduction

Thanks to the plummeting costs of molecular profiling, biomedical researchers have access to an increasing amount of genomic data, a key enabler toward a more personalized, precise, and predictive medicine. There is an increasing number of individuals who share their genomes online, sometimes with their real identifiers (e.g., on OpenSNP.org [Greshake et al 2014]). Access to such sensitive data can lead to discrimination in access to insurance and employment [Ayday et al 2015]. With the decreasing cost of DNA sequencing, genomic data is currently being used mainly in the following two areas: (i) clinical diagnostics, for personalized genomic medicine and genetic research (e.g., genomewide association studies), and (ii) direct-to-consumer genomics, for genetic risk estimation of various diseases or for recreational activities such as ancestry search. It has been reported that two particular SNPs (rs7412 and rs429358) on the Apolipoprotein E (ApoE) gene indicate an

Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.