Abstract

Traditional searchable encryption schemes construct document vectors based on the term frequency-inverse document frequency (TF-IDF) model. Such vectors are not only high-dimensional and sparse but also ignore the semantic information of the documents. The Sentence Bidirectional Encoder Representations from Transformers (SBERT) model can be used to train vectors containing document semantic information to realize semantic-aware multi-keyword search. In this paper, we propose a privacy-preserving searchable encryption scheme based on the SBERT model. The SBERT model is used to train vectors containing the semantic information of documents, and these document vectors are then used as input to the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) clustering algorithm. The HDBSCAN algorithm generates a soft cluster membership vector for each document. We treat each cluster as a topic, and the vector represents the probability that the document belongs to each topic. According to the clustering process in the schemes, the topic-term frequency-inverse topic frequency (TTF-ITF) model is proposed to generate keyword topic vectors. Through the SBERT model, searchable encryption scheme can achieve more precise semantic-aware keyword search. At the same time, the special index tree is used to improve search efficiency. The experimental results on real datasets prove the effectiveness of our scheme.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call