Abstract

Most distributed data mining algorithms can efficiently manage and mine complete data from distributed resources. However, for an incomplete data some modifications are required in order to perform distributed data mining techniques and maintaining the privacy of the sensitive information to provide pretty good results of data mining. Classification is important tasks of data mining aimed at discovering knowledge and classify new instances. SVM is classified as one of the most important algorithm used for classification problems in several various spheres. In this paper, we proposed a new distributed privacy-preserving protocol with multiple imputations of missing or incomplete data. More so, a multiple imputations based on multivariate imputation by chained equations is used for missing data and Paillier cryptosystem for maintaining the privacy of the participants. Finally we constructed a global SVM model by introducing a third party (semi-honest approach) over vertical partition data based in Gram matrix without revealing the privacy of the data and used to classify new instances. The performance evolution of the proposed protocol was investigated while using accuracy metric on the distributed and centralized data. Results of our experiments reveal that the accuracy is the same as centralized data and achieve better results with imputed data while compared with omitted data. The performance of distributed data on our protocol achieves better processing time compared with centralized data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call