Analysis of privacy preserving random perturbation techniques

Haimonti Dutta,Souptik Datta,Krishnamoorthy Sivakumar,Hillol Kargupta

doi:10.1145/1005140.1005145

Abstract

Privacy is becoming an increasingly important issue in many data mining applications, particularly in the security and defense area. This has triggered the development of many data mining techniques. A large fraction of them uses randomized data distortion techniques to mask the data for preserving the privacy. They attempt to hide the sensitive data by randomly modifying the data values using additive noise. This paper questions the utility of such randomized data distortion technique for preserving privacy in many cases and urges caution. It notes that random objects (particularly random matrices) have predictable structures in the spectral domain and then offers a random matrix-based spectral filtering technique to retrieve original data from the data-set distorted by adding random values. It extends our earlier work questioning the efficacy of random perturbation techniques using additive noise for data mining in continuous valued domain and presents new results in the discrete domain. It shows that the growing collection of random perturbation-based privacy-preserving data mining techniques may need a careful scrutiny in order to prevent privacy breaches through linear transformations. The paper also presents extensive experimental results in order to support this claim.

Full Text