Privacy has become an emerging challenge in both information theory and computer science due to massive (centralized) collection of user data. In this article, we overview privacy-preserving mechanisms and metrics from the lenses of information theory, and unify different privacy metrics, including f-divergences, Rényi divergences, and differential privacy (DP), in terms of the probability likelihood ratio (and its logarithm). We review recent progress on the design of privacy-preserving mechanisms according to the privacy metrics in computer science, where DP is the standard privacy notion which controls the output shift given small input perturbation, and information theory, where the privacy is guaranteed by minimizing information leakage. In particular, for DP, we include its important variants (e.g., Rényi DP, Pufferfish privacy) and properties, discuss its connections with information-theoretic quantities, and provide the operational interpretations of its additive noise mechanisms. For information-theoretic privacy, we cover notable frameworks, including the privacy funnel, originated from rate-distortion theory and information bottleneck, to privacy guarantee against statistical inference/guessing, and information obfuscation on samples and features. Finally, we discuss the implementations of these privacy-preserving mechanisms in current data-driven machine learning scenarios, including deep learning, information obfuscation, federated learning, and dataset sharing.
Read full abstract