Privacy-preserving kernel k-means clustering outsourcing with random transformation

Keng-Pei Lin

doi:10.1007/s10115-016-0923-2

Abstract

Clustering is a common task for organizing data into clusters. The kernel k-means identifies clusters of nonlinearly separable data by applying the kernel trick to the commonly used k-means clustering to group data in the kernel-induced feature space. Since the kernel k-means is costly in computation due to the quadratic complexity, outsourcing the computations of kernel k-means to external computing service providers can benefit the data owner who has only limited computing resources. However, data privacy is a critical concern in outsourcing since the data may contain sensitive information. Existing works of privacy-preserving outsourcing for general kernel methods based on distance preservation are weak in security. We propose a privacy-preserving outsourcing scheme for the kernel k-means based on the randomly linear transformation and the random perturbation of the kernel matrix. The data sent to the service provider are encrypted, and the service provider solves the kernel k-means from the encrypted data. The proposed scheme is much stronger in security than existing works, and the experimental results show that the proposed privacy-preserving kernel k-means method has similar clustering performance with a normal large-scale kernel k-means algorithm and imposes very little overhead on the data owner.

Full Text