Abstract

With the decreasing cost and availability of human genome sequencing, genomic privacy becomes an important issue. Several methods have been proposed in the literature to overcome these problems including cryptographic and privacy preserving data mining methods: homomorphic encryption, cryptographic hardware. In a recent work, Barman et. al studied privacy threats and practical solutions considering an SNP based scenario. The authors introduced a new protocol where a malicious medical center processes an active attack in order to retrieve genomic data of a given patient. The authors have mentioned that this protocol provides a trade-off between privacy and practicality. In this paper, we first give an overview of the system for SNP based risk calculation. We provide the definitions of privacy threats and briefly Barman et al.’s protocol and their solution. The authors proposed to use a weighted sum method of SNP coefficients for calculating disease tendency. They argue that the specific choice of the bases would prevent unique identification of SNPs. Our main observation is that this is not true. Contrary to the security claim, SNP combinations can be identified uniquely in many different scenarios. Our method exploits a pre-computed look-up table for retrieving SNPs’ values from the test result. An attacker can obtain all SNP values of a given patient by using the pre-computed look-up table. We provide practical examples of weights and pre-computed tables. We also mention that even in the case where the table is large and the attacker can not handle at one time, he can still gather information using multi queries. Our work shows that more realistic attack scenarios must be considered in the design of genetic security systems.

Highlights

  • RECENT DEVELOPMENTS in high throughput sequencing technologies led to a decrease in the cost of genomic sequencing

  • Djatmiko et al [9] proposed a privacy-preserving algorithm to compute genomic tests that need the linear combination of single nucleotide poly-morphism (SNP) values

  • The authors declare that once the data center (DC) makes sure that the test is legitimate, it computes the encryption of the partial test result, ENC(G2)

Read more

Summary

INTRODUCTION

RECENT DEVELOPMENTS in high throughput sequencing technologies led to a decrease in the cost of genomic sequencing. Since genomic data includes sensitive information for individuals and their relatives, efficient use of this data with privacy-preserving techniques becomes an important issue. The prevention of cyber-attacks by hospitals may not be possible due to insufficient high skilled workers and technology The solution to this problem is the storage and processing of genomic data in a privacy-protected manner in a third-party service provider. The human genome consists of four different nucleotides (A,C,G,T) These nucleotides form about 20.000 - 25.000 genes responsible for producing various types of proteins which are assigned inside the cells during whole life processes. Since SNPs form the nonredundant part of the genome and contains minimalistic information, it makes sense to consider privacy-preserving protocols in terms of SNP’s. Several methods have been proposed in the literature to overcome these problems including cryptographic and privacy-preserving data mining methods: homomorphic encryption [3], cryptographic hard-ware [4],[5]

Related Work
Our Contributions
SYSTEM MODEL
Test Inference Attacks
Passive SNP Retrieval Attack
Active SNP Retrieval Attack
PROPOSED ACTIVE SNP RETRIEVAL ATTACK
COUNTERMEASURES
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call