Privacy-preserving Naïve Bayes classification

Jaideep Vaidya,Murat Kantarcıoğlu,Chris Clifton

doi:10.1007/s00778-006-0041-y

Abstract

Privacy-preserving data mining--developing models without seeing the data --- is receiving growing attention. This paper assumes a privacy-preserving distributed data mining scenario: data sources collaborate to develop a global model, but must not disclose their data to others. The problem of secure distributed classification is an important one. In many situations, data is split between multiple organizations. These organizations may want to utilize all of the data to create more accurate predictive models while revealing neither their training data/databases nor the instances to be classified. Naive Bayes is often used as a baseline classifier, consistently providing reasonable classification performance. This paper brings privacy-preservation to that baseline, presenting protocols to develop a Naive Bayes classifier on both vertically as well as horizontally partitioned data.

Full Text