Abstract
The protection and processing of sensitive data in big data systems are common problems as the increase in data size increases the need for high processing power. Protection of the sensitive data on a system that contains multiple connections with different privacy policies, also brings the need to use proper cryptographic key exchange methods for each party, as extra work. Homomorphic encryption methods can perform similar arithmetic operations on encrypted data in the same way as a plain format of the data. Thus, these methods provide data privacy, as data are processed in the encrypted domain, without the need for a plain form and this allows outsourcing of the computations to cloud systems. This also brings simplicity on key exchange sessions for all sides. In this paper, we propose novel privacy preserving clustering methods, alongside homomorphic encryption schemes that can run on a common high performance computation platform, such as a cloud system. As a result, the parties of this system will not need to possess high processing power because the most power demanding tasks would be done on any cloud system provider. Our system offers a privacy preserving distance matrix calculation for several clustering algorithms. Considering both encrypted and plain forms of the same data for different key and data lengths, our privacy preserving training method’s performance results are obtained for four different data clustering algorithms, while considering six different evaluation metrics.
Highlights
In recent years, there is an increasing demand for outsourced cloud systems that allow tenants to rapidly handle sensitive data that are collected from systems, including military systems, health care systems, or banking systems
We propose novel privacy preserving clustering methods, alongside homomorphic encryption schemes that can run on a common high performance computation platform, such as a cloud system
The clustering methods that have been described in Section 3.1 were examined by running each of them on 10 different datasets using five different crypto key lengths
Summary
There is an increasing demand for outsourced cloud systems that allow tenants to rapidly handle sensitive data that are collected from systems, including military systems, health care systems, or banking systems. Cloud platforms contain data batches with different privacy policies; they still have to be analyzed in a mutual way. Consider the case of different medical institutes that want to jointly build a disease diagnosis model using a machine learning algorithm. In this case, privacy policies and General Data Protection Regulation (GDPR) prevent these medical institutes from sharing with each other [2,3]. Privacy policies and General Data Protection Regulation (GDPR) prevent these medical institutes from sharing with each other [2,3] In this case, traditional machine learning methods cannot be applied. Sensitive data cannot be distributed publicly every time, due to the different privacy policies that different parties have
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have