Abstract

Data distortion is inevitable in privacy-preserving data publication and a lot of quality metrics have been proposed to measure the quality of anonymous data, where information loss metrics are popularly used. Most of existing information loss metrics, however, are non-semantic and hence are limited in reflecting the data distortion. Thus, the utility of anonymous data based on these metrics is constrained. In this paper, we propose a novel semantic information loss metric SILM, which takes into account the correlation among attributes. This new metric can capture the distortion more precisely than the state of art information loss metrics especially for the scenario where strong correlations exist among attributes. We evaluated the effect of SILM on data quality in terms of the accuracy of aggregate query answering and classification. Comprehensive experiments demonstrate that SILM can help improve the quality of anonymous data much more especially if integrated with proper anonymization algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call