Abstract
With the explosive increase of document files, more and more data owners outsource their documents to the public cloud, which can decrease the costs of local data management systems. However, the problem of information privacy leakage in the cloud is a great challenge and it has been attracting more and more attention. In this article, we propose a secure and efficient document search scheme, named SES, based on both the cloud and fog systems. All the documents are symmetrically encrypted before being outsourced to the cloud, and an index vector is constructed based on the keywords for each document. Specifically, we integrate the position information of keywords into the TF-IDF model to generate document vectors, which are accurate and inherent summarizations about the documents. In query requests, a data user needs to provide a set of keywords, which are first extended by the Word2Vec tool and then mapped to a query vector. The extension process of keywords makes the provided keywords more comprehensive and accurate, and hence, it improves document search accuracy. To achieve the forward and backward security, both the document vectors and query vectors are appended with an ingenious vector. The relevance score between a document and a query is defined as the inner product of the document vector and the query vector. We return the most k relevant documents as the search results to the data users. To protect the contextual information stored in the document and query vectors, we encrypt the vectors by the secure kNN algorithm. To improve the search efficiency, a searchable index structure for the document set is constructed based on the Diffie–Hellman secret key negotiation algorithm. The analysis and simulation results illustrate that the proposed scheme performs well in terms of both security and search efficiency.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have