Abstract

Supervised learning techniques such as classification algorithms learn from training data to predict the correct label for newly presented input data. In many real-world scenarios, training data required by such techniques can contain personal information and data collection can be a significant problem due to privacy concerns. Cryptographic techniques have been used before to do training on encrypted data. However, such techniques are computationally expensive and they are not scalable most of the time. If a dataset in another party will be used for training, differential privacy technology can be used to preserve the privacy of the individuals in the dataset. When there is no such dataset and data needs to be collected from individuals directly for training, local differential privacy can be used. Local differential privacy is a technology to preserve privacy during data sharing with an untrusted data collector. In this work, we propose to use local differential privacy techniques to train a Naive Bayes classifier. Using the proposed solution, an untrusted party collects perturbed data from individuals that keep the relationship between the feature values and class labels. By estimating probabilities needed by the Naive Bayes classifier using the perturbed data, the untrusted party can classify new instances with high accuracy. We develop solutions that work for both discrete and continuous data. We also propose utilizing dimensionality reduction techniques to decrease communication cost and improve accuracy. We show the accuracy of the proposed Naive Bayes classifier achieving local differential privacy via experiments on several datasets. We also show how dimensionality reduction enhances the accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call