Privacy-preserving top-k keyword similarity search over outsourced cloud data

Yiping Teng,Kai Shuang,Yulong Wang,Xiang Cheng,Sen Su

doi:10.1109/cc.2015.7385519

Abstract

In this paper, we study the problem of privacy-preserving top-k keyword similarity search over outsourced cloud data. Taking edit distance as a measure of similarity, we first build up the similarity keyword sets for all the keywords in the data collection. We then calculate the relevance scores of the elements in the similarity keyword sets by the widely used tf-idf theory. Leveraging both the similarity keyword sets and the relevance scores, we present a new secure and efficient tree-based index structure for privacy-preserving top-k keyword similarity search. To prevent potential statistical attacks, we also introduce a two-server model to separate the association between the index structure and the data collection in cloud servers. Thorough analysis is given on the validity of search functionality and formal security proofs are presented for the privacy guarantee of our solution. Experimental results on real-world data sets further demonstrate the availability and efficiency of our solution.

Full Text