Abstract

Micro-databases are unique datasets that contain person-specific information about individuals. Preserving the privacy of such datasets has become a cause for serious concern since this massive repository of personalized data regularly gets published in the public domain. Sanitization mechanisms are specialized techniques that provide the required privacy guarantees to the published data. The work in this article establishes an efficient framework for quantitatively estimating the effectiveness of any privacy-preservation scheme which employs the anonymization principle. In our study, we have introduced an information-theoretic metric termed as <i>Sanitization Degree</i> <inline-formula><tex-math notation="LaTeX">$(\eta)$</tex-math></inline-formula> which assigns a cumulative score in the range [0,1] for a generic anonymization process. The design of our proposed metric is based on the fundamental fact that any sanitization mechanism attempts to reduce the amount of correlated information within the database attributes while simultaneously preserving the utility of the original dataset. Furthermore, we have characterized the <i>privacy-utility</i> tradeoff associated with our model by establishing a working relationship between these two fundamental quantities. We have empirically computed the value of <inline-formula><tex-math notation="LaTeX">$\eta$</tex-math></inline-formula> for three popular anonymization models ( <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> -anonymity, <inline-formula><tex-math notation="LaTeX">$l$</tex-math></inline-formula> -diversity, and <inline-formula><tex-math notation="LaTeX">$t$</tex-math></inline-formula> -closeness) over a couple of real-life micro-databases, thereby demonstrating the practicability of our study.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call