Abstract

The advent of next-generation sequencing (NGS) machines made DNA sequencing cheaper, but also put pressure on the genomic life-cycle, which includes aligning millions of short DNA sequences, called reads, to a reference genome. On the performance side, efficient algorithms have been developed, and parallelized on public clouds. On the privacy side, since genomic data are utterly sensitive, several cryptographic mechanisms have been proposed to align reads more securely than the former, but with a lower performance. This paper presents DNA-SeAl a novel contribution to improving the privacy × performance product in current genomic workflows. First, building on recent works that argue that genomic data needs to be treated according to a threat-risk analysis, we introduce a multi-level sensitivity classification of genomic variations designed to prevent the amplification of possible privacy attacks. We show that the usage of sensitivity levels reduces future re-identification risks, and that their partitioning helps prevent linkage attacks. Second, after extending this classification to reads, we show how to align and store reads using different security levels. To do so, DNA-SeAl extends a recent reads filter to classify unaligned reads into sensitivity levels, and adapts existing alignment algorithms to the reads sensitivity. We show that using DNA-SeAl allows high performance gains whilst enforcing high privacy levels in hybrid cloud environments.

Highlights

  • D NA sequencing and the alignment of sequences are at the heart of applications such as precision medicine, forensics, Manuscript received July 26, 2018; revised December 17, 2018, March 26, 2019, and April 18, 2019; accepted April 29, 2019

  • The genomic variations promotion slightly change the distribution among the sensitivity levels, as Figure 4b shows

  • We evaluated the increased privacy protection that the use of sensitivity levels can bring to genomic data using two metrics: the genomic privacy metric, and the Likelihood Ratio (LR) value

Read more

Summary

Introduction

D NA sequencing and the alignment of sequences are at the heart of applications such as precision medicine, forensics, Manuscript received July 26, 2018; revised December 17, 2018, March 26, 2019, and April 18, 2019; accepted April 29, 2019. Date of publication June 28, 2019; date of current version March 6, 2020. UID/CEC/00408/2019 and DeST: Deep Semantic Tagger project, ref. Couto is with the Laboratorio de Sistemas Informaticos de Grande Escala, Departamento de Informatica, Faculdade de Ciencias, Universidade de Lisboa, 1749-016 Lisboa, Portugal (e-mail:, fjcouto@ ciencias.ulisboa.pt)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call