Efficient privacy-preserving outsourced k-means clustering on distributed data

Guowei Qiu,Yingliang Zhao,Xiaolin Gui

doi:10.1016/j.ins.2024.120687

Abstract

Today, more and more data is collected and stored by different organizations. When the data is distributed among multiple users who wish to perform data mining on the joint data, outsourcing the task to cloud servers becomes an attractive solution for users who lack professional skills. However, privacy concerns make users reluctant to adopt such solutions. In this paper, we propose a privacy-preserving outsourced k-means clustering (PPOKC) algorithm on distributed data. Our system consists of multiple users and two non-colluding servers. Each user submits their data to the servers, which perform k-means clustering over the joint data without compromising data privacy. Unlike existing solutions, which typically include a cryptographic server and a computing server, each server in our system provides both services. Based on this architecture, we first design several important sub-protocols, including secure comparison, secure minimum and secure division. We then use these protocols to construct an efficient and highly secure PPOKC algorithm. The implementation of our algorithm relies heavily on secret sharing techniques, complemented by homomorphic encryption. We also conduct theoretical analysis and experiments. Security analysis shows that our scheme guarantees the security of the input/output data, the intermediate results and the data access pattern under the semi-honest model. Complexity analysis and numerical experiments show that our algorithm has good efficiency and is suitable for practical applications.

Full Text