Accurate and large-scale privacy-preserving data mining using the election paradigm

Emmanouil Magkos,Manolis Maragoudakis,Vassilis Chrissikopoulos,Stefanos Gritzalis

doi:10.1016/j.datak.2009.06.003

Emmanouil Magkos, Manolis Maragoudakis + Show 2 more

https://doi.org/10.1016/j.datak.2009.06.003

Copy DOI

Abstract

With the proliferation of the Web and ICT technologies there have been concerns about the handling and use of sensitive information by data mining systems. Recent research has focused on distributed environments where the participants in the system may also be mutually mistrustful. In this paper we discuss the design and security requirements for large-scale privacy-preserving data mining (PPDM) systems in a fully distributed setting, where each client possesses its own records of private data. To this end we argue in favor of using some well-known cryptographic primitives, borrowed from the literature on Internet elections. More specifically, our framework is based on the classical homomorphic election model, and particularly on an extension for supporting multi-candidate elections. We also review a recent scheme [Z. Yang, S. Zhong, R.N. Wright, Privacy-preserving classification of customer data without loss of accuracy, in: SDM’ 2005 SIAM International Conference on Data Mining, 2005] which was the first scheme that used the homomorphic encryption primitive for PPDM in the fully distributed setting. Finally, we show how our approach can be used as a building block to obtain Random Forests classification with enhanced prediction performance.

Full Text