Outsourced cloud computing can be considered as an effective way to overcome the data island among users and relieve the pressure of limited resources. However, due to the concerns about trust in cloud servers, outsourcing the users’ data and model training task has considerable privacy disclosure risks. This paper presents a PriKPM scheme by using additive secret sharing (ASS), so as to implement the privacy-preserving k-prototype clustering for mixed data (i.e., including numerical and categorical attributes). In PriKPM, data samples are randomly split into two shares and delivered offline to two collaborative servers. We design a secure initialization method for determining the location and number of cluster centers. Then, both servers securely calculate the mixed distance between samples and cluster centers, and execute the samples partion and cluster updating operations. An efficient and secure comparison protocol is developed to offer flexibly the “less than or equal” and “equal” functions during the entire clustering process. Furthermore, theoretical analysis proves the effectiveness and security of PriKPM. Sufficient experiments demonstrate that PriKPM is computationally more efficient than existing secure clustering works. PriKPM can achieve the approximate accuracy of the plaintext k-prototype clustering scheme.
Read full abstract