Abstract

During the data privacy process, the utility of datasets diminishes as sensitive information such as personal identifiable information (PII) is removed, transformed, or distorted to achieve confidentiality. The intractability of attaining an equilibrium between data privacy and utility needs is well documented, requiring trade-offs, and further complicated by the fact that making such trade-offs also remains problematic. Given such complexity, in this paper, we endeavor to empirically investigate what parameters could be fine-tuned to achieve an acceptable level of data privacy and utility during the data privacy process, while making reasonable trade-offs. Therefore, we present the comparative classification error gauge (Comparative x-CEG) approach, a data utility quantification concept that employs machine learning classification techniques to gauge data utility based on the classification error. In this approach, privatized datasets are passed through a series of classifiers, each of which returns a classification error, and the classifier with the lowest classification error is chosen; if the classification error is lower or equal to a set threshold then better utility might be achieved, otherwise, adjustment to the data privacy parameters are made to the chosen classifier. The process repeats x times until the desired threshold is reached. The goal is to generate empirical results after a range of parameter adjustments in the data privacy process, from which a threshold level might be chosen to make trade-offs. Our preliminary results show that given a range of empirical results, it might be possible to choose a tradeoff point and publish privacy compliant data with an acceptable level of utility.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call