Privacy-aware Character Pattern Matching over Outsourced Encrypted Data

Nicholas Mainardi,Gerardo Pelosi,Alessandro Barenghi

doi:10.1145/3462333

Abstract

Providing a method to efficiently search into outsourced encrypted data, without forsaking strong privacy guarantees, is a pressing concern rising from the separation of data ownership and data management typical of cloud-based applications. While several existing solutions allow a client to look up the occurrences of a substring in an outsourced document collection, the practical application requirements in terms of privacy and efficiency call for the improvement of such solutions. In this work, we present a privacy-preserving substring search protocol with a polylogarithmic communication cost and a limited computational effort on the server side. The proposed protocol provides search pattern and access pattern privacy, for both exact string search and character-pattern search with wildcards. Its extension to a multi-user setting shows significant savings in terms of outsourced storage w.r.t. a baseline solution where the whole dataset is replicated. The performance figures of an optimized implementation of our protocol, searching into a remotely stored genomic dataset, validate the practicality of the approach exhibiting a data transfer of less than 50 kiB to execute a query over a document of 40 MiB, with execution times on client and server in the range of a few seconds and a few minutes, respectively.

Highlights

The significant improvements in reliability and total cost of ownership provided by remote data management services have proven on field to be beneficial to a large variety of enterprises
Relying on the substring search algorithm based on the Burrows Wheeler Transform (BWT) transformation reported in Algorithm 1 and the Lipmaa’s Private Information Retrieval (PIR) protocol based on the Flexible Length Additive Homomorphic Encryption (FLAHE) Paillier scheme, we provide the operational description of the proposed privacy-preserving substring search (PPSS) protocol, reported in Algorithm 5 and Algorithm 6
We evaluated our enhanced protocol with the batched retrieval method, which is able to fetch all the occurrences in a single round of communication. To fairly compare this approach with the non-batched one, we report in Figure 4 the amortized client, server, and communication costs, i.e., the costs referring to the batched retrieval solution are divided by the total number of occurrences oq

Summary

Introduction

The significant improvements in reliability and total cost of ownership provided by remote data management services have proven on field to be beneficial to a large variety of enterprises. In this context, a company relies on cloud-based services to store a significant amount of its own data in an infrastructure located in a third-party data center, beyond the means of its direct control. There is a pressing need for effective solutions enabling a set of querying functionalities on encrypted data, possibly by multiple users, preserving the confidentiality of the searched information even against the service (storage) provider itself

Objectives

Results

Conclusion