Abstract

Providing a method to efficiently search into outsourced encrypted data, without forsaking strong privacy guarantees, is a pressing concern rising from the separation of data ownership and data management typical of cloud-based applications. While several existing solutions allow a client to look up the occurrences of a substring in an outsourced document collection, the practical application requirements in terms of privacy and efficiency call for the improvement of such solutions. In this work, we present a privacy-preserving substring search protocol with a polylogarithmic communication cost and a limited computational effort on the server side. The proposed protocol provides search pattern and access pattern privacy, for both exact string search and character-pattern search with wildcards. Its extension to a multi-user setting shows significant savings in terms of outsourced storage w.r.t. a baseline solution where the whole dataset is replicated. The performance figures of an optimized implementation of our protocol, searching into a remotely stored genomic dataset, validate the practicality of the approach exhibiting a data transfer of less than 50 kiB to execute a query over a document of 40 MiB, with execution times on client and server in the range of a few seconds and a few minutes, respectively.

Highlights

  • The significant improvements in reliability and total cost of ownership provided by remote data management services have proven on field to be beneficial to a large variety of enterprises

  • Relying on the substring search algorithm based on the Burrows Wheeler Transform (BWT) transformation reported in Algorithm 1 and the Lipmaa’s Private Information Retrieval (PIR) protocol based on the Flexible Length Additive Homomorphic Encryption (FLAHE) Paillier scheme, we provide the operational description of the proposed privacy-preserving substring search (PPSS) protocol, reported in Algorithm 5 and Algorithm 6

  • We evaluated our enhanced protocol with the batched retrieval method, which is able to fetch all the occurrences in a single round of communication. To fairly compare this approach with the non-batched one, we report in Figure 4 the amortized client, server, and communication costs, i.e., the costs referring to the batched retrieval solution are divided by the total number of occurrences oq

Read more

Summary

Introduction

The significant improvements in reliability and total cost of ownership provided by remote data management services have proven on field to be beneficial to a large variety of enterprises. In this context, a company relies on cloud-based services to store a significant amount of its own data in an infrastructure located in a third-party data center, beyond the means of its direct control. There is a pressing need for effective solutions enabling a set of querying functionalities on encrypted data, possibly by multiple users, preserving the confidentiality of the searched information even against the service (storage) provider itself

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call