Abstract

Motivation: Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated information. The user would like to keep the query and result private and the server the database.Approach: We propose a novel approach that combines efficient string data structures such as the Burrows–Wheeler transform with cryptographic techniques based on additive homomorphic encryption. We assume that the sequence data is searchable in efficient iterative query operations over a large indexed dictionary, for instance, from large genome collections and employing the (positional) Burrows–Wheeler transform. We use a technique called oblivious transfer that is based on additive homomorphic encryption to conceal the sequence query and the genomic region of interest in positional queries.Results: We designed and implemented an efficient algorithm for searching sequences of SNPs in large genome databases. During search, the user can only identify the longest match while the server does not learn which sequence of SNPs the user queried. In an experiment based on 2184 aligned haploid genomes from the 1000 Genomes Project, our algorithm was able to perform typical queries within 4.6 s and 10.8 s for client and server side, respectively, on laptop computers. The presented algorithm is at least one order of magnitude faster than an exhaustive baseline algorithm.Availability and implementation: https://github.com/iskana/PBWT-sec and https://github.com/ratschlab/PBWT-sec.Contacts: shimizu-kana@aist.go.jp or Gunnar.Ratsch@ratschlab.orgSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • String search is a fundamental task in the field of genome informatics, for which a large variety of techniques have been developed

  • In Experiments, we evaluate the performance of PBWT-sec on datasets created from data of the 1,000 Genomes Project [30] and compare it to an alternative method for fixed-length k-mer search

  • We propose to use an algorithm for sublinear-communication Oblivious transfer (OT) (SC-OT) proposed in [32]

Read more

Summary

Introduction

String search is a fundamental task in the field of genome informatics, for which a large variety of techniques have been developed (see, for instance,[2, 17, 20]) Those techniques have been optimized for accuracy and computational efficiency, a recent boom of personal genome sequencing and analyses has spotlighted a new criteria, namely, privacy protection. As reported in many studies, a genome is considered to be one of the most critical pieces of information for an individual’s privacy It is largely different from any other personal information because it works as an identifier of an individual while it possesses the information that has strong correlation with the phenotype of the individual [27, 11].

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.