Anonymous Mechanism Research Articles

Packet measurements at scale are essential for several applications, such as cyber-security, accounting and troubleshooting. They, however, threaten users’ privacy by exposing sensitive information. Anonymization has been the answer to this challenge, i.e., replacing sensitive information with obfuscated copies. Anonymization of packet traces, however, comes with some challenges and drawbacks. First, it reduces the value of data. Second, it requires to consider diverse protocols because information may leak from many non-encrypted fields. Third, it must be performed at high speeds directly at the monitor, to prevent private data from leaking, calling for real-time solutions. We present <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula> -MON, a flexible tool for privacy-preserving packet monitoring. It replicates input packet streams to different consumers while anonymizing protocol fields according to flexible policies that cover all protocol layers. Beside classic anonymization mechanisms such as IP address obfuscation, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula> -MON supports <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${z}$ </tex-math></inline-formula> -anonymization, a novel solution to obfuscate rare values that can be uniquely traced back to limited sets of users. Differently from classic anonymization approaches, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">z-anonymity</i> works on a streaming fashion, with zero delay, operating at high-speed links on a packet-by-packet basis. We quantify the impact of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">z-anonymity</i> on traffic measurements, finding that it introduces minimal error when it comes to finding heavy-hitter services. We evaluate <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula> -MON performance using packet traces collected from an ISP network and show that it achieves a sustainable rate of 40 Gbit/s on a Commercial Off-the Shelf server. <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula> -MON is available to the community as an open-source project.

Read full abstract

Research and practical development of data anonymization techniques has proliferated in recent years. Although the privacy literature has questioned the efficacy of data anonymization at protecting individuals against harms associated with re-identification, this paper raises another new set of questions: whether anonymization techniques themselves can mask statistical disparities and thus conceal evidence of disparate impact that is potentially discriminatory. If so, the choice of data anonymization technique to protect privacy, and the specific technique employed, may pick winners and losers. Examining the implications of these choices on the potentially disparate impact of privacy protection on underprivileged sub-populations is thus a critically important policy question. The paper begins with an interdisciplinary overview of common mechanisms of data anonymization and prevalent types of statistical evidence for disparity. In terms of data-anonymization mechanisms, the common ones are data removal (e.g., k-anonymity), which aims to remove the part of a dataset that could potentially identify an individual; and noise insertion (e.g., differential privacy), which inserts into a dataset carefully designed noises that block the identification of individuals yet allow the accurate recovery of certain summary statistics. In terms of the statistical evidence for disparity, the commonly accepted types are disparity through separation (e.g., the two or three standard deviations rule for a prima facie case of discrimination), which is grounded in the idea of detecting the separation between the outcome distributions for different sub-populations; and disparity through variation (e.g., the likely than not rule in toxic tort cases), which concentrates on the magnitude of difference between the mean outcomes of different sub-populations. We develop conceptual foundation and mathematical formalism demonstrating that the data anonymization mechanisms have distinctive impacts on the identifiability of disparity, which also varies based on its statistical operationalization. Specifically, under the regime of disparity through separation, data removal tends to produce more false positives (i.e., detecting false disparity when none exists) than false negatives (i.e., failing to detect an existing disparity); while noise insertion rarely produces any false positives at all. Meanwhile, noise insertion does produce false positives (equally likely as false negatives) under the regime of disparity through variation; while the likelihood for data removal to produce false positives and false negatives depend on the underlying data distribution. We empirically validated our findings with an inpatient dataset from one of the five most populated states in the U.S. We examined four data-anonymization techniques (two in the data-removal category and the other in noise insertion), ranging from the current rules used by the State of Texas to anonymize their state-wide inpatient discharge dataset to the state-of-the-art differential privacy algorithms for regression analysis. After presenting the empirical results, which confirmed our conceptual and mathematical findings, we conclude the paper by discussing the business and policy implications of these findings, highlighting the need for firms and policy makers to balance between the protection of privacy and the recognition/rectification of disparate impact. In sum, our paper identifies an important knowledge gap in both tech and law fields: whether data anonymization technologies themselves can mask statistical disparities and thus conceal the evidence of disparate impact that is potentially discriminatory. The emergence of privacy laws (e.g., GDPR) gives primacy to answering this question, because if such disparate impacts do exist, legislators and regulators would be essentially picking winners and losers by requiring or incentivizing the use of data anonymization techniques. This paper tackles this timely yet complex challenge, especially given the current public discourse in the U.S. about racial discrimination, and the worldwide trend of prioritizing the protection of consumer privacy in legislations and regulations.

Read full abstract

Anonymous Mechanism Research Articles

Related Topics

Articles published on Anonymous Mechanism

Multivariate Microaggregation of Set-Valued Data

Minimax optimal goodness-of-fit testing for densities and multinomials under a local differential privacy constraint

Toward Privacy Preservation Using Clustering Based Anonymization: Recent Advances and Future Research Outlook

Protecting Location Privacy by Multiquery: A Dynamic Bayesian Game Theoretic Approach

A Conditional Privacy-Preserving Certificateless Aggregate Signature Scheme in the Standard Model for VANETs

A Large-scale Empirical Analysis of Ransomware Activities in Bitcoin

Cross-Platform Strong Privacy Protection Mechanism for Review Publication

AWAP: Adaptive weighted attribute propagation enhanced community detection model for bitcoin de-anonymization

α-MON: Traffic Anonymizer for Passive Monitoring

Blockchain-assisted handover authentication for intelligent telehealth in multi-server edge computing environment

Pattern Mining and Detection of Malicious SQL Queries on Anonymization Mechanism

Privacy in the Cloud: A Survey of Existing Solutions and Research Challenges

Efficient and Privacy-Preserving Medical Research Support Platform Against COVID-19: A Blockchain-Based Approach

Implications of Data Anonymization on the Statistical Evidence of Disparity

Impact of prior knowledge on privacy leakage in trajectory data publishing

An efficient blockchain-based privacy preserving scheme for vehicular social networks

Efficient Certificateless Aggregate Signature With Conditional Privacy Preservation in IoV

Mutual authentication‐based RA scheme for embedded systems

Identifying the vulnerabilities of bitcoin anonymous mechanism based on address clustering

Privacy Preserving in LBS resilient to Location Injection Attacks

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Anonymous Mechanism Research Articles

Related Topics

Articles published on Anonymous Mechanism

Multivariate Microaggregation of Set-Valued Data

Minimax optimal goodness-of-fit testing for densities and multinomials under a local differential privacy constraint

Toward Privacy Preservation Using Clustering Based Anonymization: Recent Advances and Future Research Outlook

Protecting Location Privacy by Multiquery: A Dynamic Bayesian Game Theoretic Approach

A Conditional Privacy-Preserving Certificateless Aggregate Signature Scheme in the Standard Model for VANETs

A Large-scale Empirical Analysis of Ransomware Activities in Bitcoin

Cross-Platform Strong Privacy Protection Mechanism for Review Publication

AWAP: Adaptive weighted attribute propagation enhanced community detection model for bitcoin de-anonymization

α-MON: Traffic Anonymizer for Passive Monitoring

Blockchain-assisted handover authentication for intelligent telehealth in multi-server edge computing environment

Pattern Mining and Detection of Malicious SQL Queries on Anonymization Mechanism

Privacy in the Cloud: A Survey of Existing Solutions and Research Challenges

Efficient and Privacy-Preserving Medical Research Support Platform Against COVID-19: A Blockchain-Based Approach

Implications of Data Anonymization on the Statistical Evidence of Disparity

Impact of prior knowledge on privacy leakage in trajectory data publishing

An efficient blockchain-based privacy preserving scheme for vehicular social networks

Efficient Certificateless Aggregate Signature With Conditional Privacy Preservation in IoV

Mutual authentication‐based RA scheme for embedded systems

Identifying the vulnerabilities of bitcoin anonymous mechanism based on address clustering

Privacy Preserving in LBS resilient to Location Injection Attacks