Abstract

Clustering is a fundamental and critical data mining branch that has been widely used in practical applications such as user purchase model analysis, image color segmentation, outlier detection, and so on. With the increasing popularity of cloud computing, more and more encrypted data are converging to cloud computing platforms for enjoying the revolutionary advantages of the cloud computing paradigm, as well as mitigating the deeply concerned data privacy issues. However, traditional data encryption makes existing clustering schemes no more effective, which greatly obstructs effective data utilization and frustrates the wide adoption of cloud computing. In this paper, we focus on solving the clustering problem over encrypted cloud data. In particular, we propose a privacy-preserving k-means clustering technology over encrypted multi-dimensional cloud data by leveraging the scalar-product-preserving encryption primitive, called PPK-means. The proposed technique is able to achieve efficient multi-dimensional data clustering as well to preserve the confidentiality of the outsourced cloud data. To the best of our knowledge, our work is the first to explore the privacy-preserving multi-dimensional data clustering in the cloud computing environment. Extensive experiments in simulation data-sets and real-life data-sets demonstrate that our proposed PPK-means is secure, efficient, and practical.

Highlights

  • MotivationIn the big data era, data mining helps people quickly discover new and valuable knowledge from large-scale data-sets, which has been used in various fields such as finance, power, insurance, biology, etc

  • We consider the cloud-based outsourcing environment and propose a privacy-preserving k-means clustering over encrypted multi-dimensional cloud data, which achieves the following three basic goals: (1) efficiently processes multi-dimension data; (2) the cloud server alone performs clustering tasks without the cooperation parties; (3) achieves data semantic security against the “honest-but-curious”

  • As long as the key M is kept secret from the cloud server, PPK-means 1 is secure against Known Ciphertext Model, i.e., the cloud server knows nothing except the encrypted center points and the user’s data

Read more

Summary

Motivation

In the big data era, data mining helps people quickly discover new and valuable knowledge from large-scale data-sets, which has been used in various fields such as finance, power, insurance, biology, etc. Researchers used secure multiparty computation protocols to construct several privacy-preserving k-means clustering schemes [8,9,10,11] These solutions, require participants to cooperatively finish the clustering tasks without revealing any of their individual data items. In such solutions, some intermediate computations have to rely on non-encryption data [7], which is not suitable for the cloud-based data outsourcing paradigm, as exposed plaintext data compromise the semantic security Another line of research utilizes data perturbation techniques such as differential privacy [12] to achieve privacy-preserving data mining. We consider the cloud-based outsourcing environment and propose a privacy-preserving k-means clustering over encrypted multi-dimensional cloud data, which achieves the following three basic goals: (1) efficiently processes multi-dimension data; (2) the cloud server alone performs clustering tasks without the cooperation parties; (3) achieves data semantic security against the “honest-but-curious”

Contributions
Related Work
System Model
Results
Threat Model
Basic Techniques
Scalar-Product-Preserving Encryption
Work Flow
Framework
Algorithm Overview
PPK-Means Construction
PPK-Means 1
PPK-Means 2
PPK-Means 3
Privacy Analysis
Time Complexity Analysis
Experimental Evaluation
Limitation
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call