A utility based approach for data stream anonymization

Ugur Sopaoglu,Osman Abul

doi:10.1007/s10844-019-00577-6

Abstract

Data streams are good models to characterize dynamic, on-line, fast and high-volume data requirements of today’s businesses. However, sensitivity of data is usually an obstacle for deployment of many data streams applications. To address this challenging issue, many privacy preserving models, including k-anonymity, have been adapted to data streams. Data stream anonymization frameworks have already addressed how to preserve data quality as much as possible under bounded delays. In this work, our main motivation is to minimize average delay while keeping data quality high. It is our claim that data utility is a function of both data quality and data aging in data streams processing tasks. However, there is a tradeoff between data aging and data quality optimizations. To this end, we present a tunable data stream k-anonymization framework and an algorithm named UBDSA (Utility Based Approach for Data Stream Anonymization). To attain high quality anonymity groups, UBDSA also introduces a new distance metric, named CAIL (Cardinality Aware Information Loss). Our experimental evaluations compare performance of UBDSA with the literature, and the results show its merit in terms of better average delay and information loss.

Full Text