Preserving the privacy of the ever-increasing multimedia data on the cloud while providing accurate and fast retrieval services has become a hot topic in information security. However, existing relevant schemes still have significant room for improvement in accuracy and speed. Therefore, this paper proposes a privacy-preserving image–text retrieval scheme called PITR. To enhance model performance with minimal parameter training, we freeze all parameters of a multimodal pre-trained model and incorporate trainable modules along with either a general adapter or a specialized adapter, which are used to enhance the model’s ability to perform zero-shot image classification and cross-modal retrieval in general or specialized datasets, respectively. To preserve the privacy of outsourced data on the cloud and the privacy of the user’s retrieval process, we employ asymmetric scalar-product-preserving encryption technology suitable for inner product calculation, and we employ distributed index storage technology and construct a two-level security model. We construct a hierarchical index structure to speed up query matching among massive high-dimensional index vectors. Experimental results demonstrate that our scheme can provide users with secure, accurate, fast cross-modal retrieval service while preserving data privacy.
Read full abstract