Abstract

Huge DBMSs storing genomic information are being created and engineerized for doing large-scale, comprehensive and in-depth analysis of human beings and their diseases. This paves the way for significant new approaches in medicine, but also poses major challenges for storing, processing and transmitting such big amounts of data in compliance with recent regulations concerning user privacy. We designed and implemented ER-index, a new full-text index in minute space which was optimized for pattern-search on compressed and encrypted genomic data using a reference sequence, and that complements a previous index for reference-free genomics. Thanks to a multi-user and multiple-keys encryption model, a single ER-index can store the sequences related to a large population of individuals so that users may perform search operations directly on compressed data and only on the sequences to which they were granted access.Tests performed of three different computing platforms show that the ER-index get very good compression ratios and search times, outperforming in many cases a reference tool that was proved nearly-optimal in time and space and does not implement encryption.The ER-index C++ source code plus scripts and data to assess the tool performance are available at: https://github.com/EncryptedIndexes/erindex.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call