Abstract

Abstract: This paper addresses the problem of sharing person-specific genomic sequences without violating the privacy of their data subjects to support large- scale biomedical research projects. The proposed method builds on the framework but extends the results in a number of ways. One improvement is that our scheme is deterministic, with zero probability of a wrong answer. We also provide a new operating point in the space-time tradeoff, by offering a scheme that is twice as fast as theirs but uses twice the storage space. This point is motivated by the fact that storage is cheaper than computation in current cloud computing pricing plans. Moreover, our encoding of the data makes it possible for us to handle a richer set of queries than exact matching between the query and each sequence of the database, including: (i) counting the number of matches between the query symbols and a sequence; (ii) logical OR matches where a query symbol is allowed to match a subset of the alphabet thereby making it possible to handle a "not equal to" requirement for a query symbol (iii) support f for the extended alphabet of nucleotide base codes that encompasses ambiguities in DNA sequences (iv) queries that specify the number of occurrences of each kind of symbol in the specified sequence positions (v) a threshold query whose answer is 'yes' if the number of matches exceeds a queryspecified threshold. (vi) For all query types we can hide the answers from the decrypting server, so that only the client learns the answer. (vii) In all cases, the client deterministically learns only the query's answer, except for query type (v) where we quantify the (very small) statistical leakage to the client of the actual count.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call