Abstract

As a commonly used algorithm in data mining, clustering has been widely applied in many fields, such as machine learning, information retrieval, and pattern recognition. In reality, data to be analyzed are often distributed to multiple parties. Moreover, the rapidly increasing data volume puts heavy computing pressure on data owners. Thus, data owners tend to outsource their own data to cloud servers and obtain data analysis results for the federated data. However, the existing privacy-preserving outsourced k -means schemes cannot verify whether participants share consistent data. Considering the scenarios with multiple data owners and sensitive information security in an outsourced environment, we propose a verifiable privacy-preserving federated k -means clustering scheme. In this article, cloud servers and participants perform k -means clustering algorithm over encrypted data without exposing private data and intermediate results in each iteration. In particular, our scheme can verify the shares from participants when updating the cluster centers based on secret sharing, hash function and blockchain, so that our scheme can resist inconsistent share attacks by malicious participants. Finally, the security and experimental analysis are carried out to show that our scheme can protect private data and get high-accuracy clustering results.

Highlights

  • Data mining technology can be used to analyze and extract potentially valuable information from large collections of data

  • As a wellknown clustering algorithm, k-means clustering [3] algorithm has the advantages of simple process and good clustering results and it can assign data into k clusters based on the distances from cluster centers

  • We propose a multi-party verifiable privacy-preserving federated k-means scheme for horizontally partitioned data

Read more

Summary

Introduction

Data mining technology can be used to analyze and extract potentially valuable information from large collections of data. Vaidya and Clifton [10] firstly proposed the multi-party privacy-preserving k-means clustering protocol on vertically partitioned data, where the secure distance computation and comparison are supported by the secure permutation scheme and homomorphic encryption. Liu et al [23], following the framework in [24], presented a privacy-preserving outsourced k-means clustering protocol that one party outsourced the distance computation to a cloud server without revealing both the data and clustering results to any party and cloud server. Jiang et al [25] introduced an efficient two-party privacy-preserving k-means clustering protocol, and this scheme can compute distance safely using subprotocols in [26] and update cluster centers using garbled circuit proposed in [27]. We propose a multi-party verifiable privacy-preserving federated k-means scheme for horizontally partitioned data.

Preliminaries
Participants
Our Construction
Step 1
Step 2
Step 3
Security Analysis
Performance Analysis
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.