Abstract
With the sustained evolution and expeditious popularization of cloud computing, an ever-increasing number of individuals and enterprises are encouraged to outsource data to cloud servers for reducing management overhead and ease of access. Privacy requirements demand encryption of sensitive information before outsourcing, which, on the other hand, diminishes the usability of data and makes considerable efficient keyword search techniques used on plaintext inapplicable. In this paper, we propose a secure multi-keyword ranked search scheme based on document similarity to work out the problem. In order to achieve the goals of multi-keyword search and ranking search results, we adopt the vector space model and TF-IDF model to generate index and query vectors. By introducing the secure kNN computation, index and query vectors can be encrypted to prevent cloud servers from obtaining sensitive frequency information. For the need of efficiency advancement, we adopt the $B^{+}$ -tree as the basic structure to build the index and construct a similar document collection for each document. Due to the use of our unique index structure, compared to linear search, the search efficiency is more exceptional. Extensive experiments on the real-world document collection are conducted to demonstrate the feasibility and efficiency of the proposed solution.
Highlights
C LOUD computing [1] has achieved extraordinary development over the past decade, both in the academic and industrial communities [2]
Aiming at problems as are mentioned above in the field of multi-keyword ranked search, in this paper, we propose a secure and efficient multi-keyword ranked search scheme based on B+-tree index, which has been extensively applied in database systems
In combination with TF-IDF model, is extensively employed for supporting efficient multi-keyword ranked search in the field of plaintext information retrieval [41] [46], TF is used to evaluate the importance of a specific term in a document, the more times a word appears in a document, the more important it is to this document, and IDF is used to measure the ability of a keyword to distinguish documents
Summary
C LOUD computing [1] has achieved extraordinary development over the past decade, both in the academic and industrial communities [2]. Some general-purpose methodologies based on fullyhomomorphic encryption [12] and oblivious RAMs [13] have been proposed to address the above problem, while the overhead for computation and communication presented in these schemes is not acceptable for both cloud servers and users. Some constructive schemes based on multi-keyword ranked search have been proposed to support intelligent and economic queries over encrypted cloud data. To defense attacks initiated by cloud servers under different threat models, we design two secure index schemes, e.g., the basic similarity-based multi-keyword ranked search (BSMRS) scheme and the enhanced similarity-based multikeyword ranked search (ESMRS) scheme The former can guarantee the confidentiality of index and query vectors, the latter is able to avoid sensitive frequency information being obtained by cloud servers to satisfy more stringent privacy protection requirements.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have