Abstract

Low-cost clouds can alleviate the compute and storage burden of the genome sequencing data explosion. However, moving personal genome data analysis to the cloud can raise serious privacy concerns. Here, we devise a method named Balaur, a privacy preserving read mapper for hybrid clouds based on locality sensitive hashing and kmer voting. Balaur can securely outsource a substantial fraction of the computation to the public cloud, while being highly competitive in accuracy and speed with non-private state-of-the-art read aligners on short read data. We also show that the method is significantly faster than the state of the art in long read mapping. Therefore, Balaur can enable institutions handling massive genomic data sets to shift part of their analysis to the cloud without sacrificing accuracy or exposing sensitive information to an untrusted third party.

Highlights

  • Low-cost clouds can alleviate the compute and storage burden of the genome sequencing data explosion

  • While deidentification and anonymization techniques have been suggested as solutions for this problem, it has been shown that such techniques cannot reliably prevent the identification of an individual from genomic data[1,2,3,4]

  • In this work we present Balaur, an efficient privacy preserving read mapping technique for hybrid clouds based on locality sensitive hashing and kmer voting

Read more

Summary

Introduction

Low-cost clouds can alleviate the compute and storage burden of the genome sequencing data explosion. Chen et al.[14] proposed a secure seedand-extend read mapping algorithm on hybrid clouds that splits the computation such that the public cloud finds the exact seed matches using encrypted seeds and the private cloud extends the seed matches using unencrypted data. With this approach, mapping 10 million 100 bp reads takes 370 CPU hours on the public cloud, an additional 2 h on the private cloud and 6.8 TB to store the reference index. In a hybrid cloud system with a fast network, Balaur can result in substantial savings of private cloud resources with respect to standard nonsecure tools that must entirely run on the private cloud

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.