Anonymization of Network Traces Data through Condensation-based Differential Privacy

Ahmed Aleroud,Sai Chaithanya Pallaprolu,Fan Yang,George Karabatis,Zhiyuan Chen

doi:10.1145/3425401

Abstract

Network traces are considered a primary source of information to researchers, who use them to investigate research problems such as identifying user behavior, analyzing network hierarchy, maintaining network security, classifying packet flows, and much more. However, most organizations are reluctant to share their data with a third party or the public due to privacy concerns. Therefore, data anonymization prior to sharing becomes a convenient solution to both organizations and researchers. Although several anonymization algorithms are available, few of them allow sufficient privacy (organization need), acceptable data utility (researcher need), and efficient data analysis at the same time. This article introduces a condensation-based differential privacy anonymization approach that achieves an improved tradeoff between privacy and utility compared to existing techniques and produces anonymized network trace data that can be shared publicly without lowering its utility value. Our solution also does not incur extra computation overhead for the data analyzer. A prototype system has been implemented, and experiments have shown that the proposed approach preserves privacy and allows data analysis without revealing the original data even when injection attacks are launched against it. When anonymized datasets are given as input to graph-based intrusion detection techniques, they yield almost identical intrusion detection rates as the original datasets with only a negligible impact.

Full Text