Abstract

The protection of private data is a hot research issue in the era of big data. Differential privacy is a strong privacy guarantees in data analysis. In this paper, we propose DP-MSNM, a parametric density estimation algorithm using multivariate skew-normal mixtures (MSNM) model to differential privacy. MSNM can solve the asymmetric problem of data sets, and it is could approximate any distribution through expectation–maximization (EM) algorithm. In this model, we add two extra steps on the estimated parameters in the M step of each iteration. The first step is adding calibrated noise to the estimated parameters based on Laplacian mechanism. The second step is post-processes those noisy parameters to ensure their intrinsic characteristics based on the theory of vector normalize and positive semi definition matrix. Extensive experiments using both real data sets evaluate the performance of DP-MSNM, and demonstrate that the proposed method outperforms DPGMM.

Highlights

  • The protection of private data is a hot research issue in the era of big data

  • The most of them are considered in the Gaussian mixtures model (GMM)[1,2], but the data can presents skewness or heavy tailed behavior

  • We need to consider some problem of skew-normal mixtures model, where (1) skew-normal distribution compose the skew-normal mixtures model, and (2) we not clear the sample belongs to which component, and (3) we just only have perturbed data, and (4) we hope to get estimate values of all parameters. Addressing on these problems, we propose DP-multivariate skew-normal mixtures (MSNM), which appends two extra steps to the general method of density estimation with MSNM in each iteration like

Read more

Summary

Introduction

The protection of private data is a hot research issue in the era of big data. Differential privacy is a strong privacy guarantees in data analysis. Differential private density estimation has been done on mixtures model. The noise adding step is adding classical Laplace noise to the original estimated parameters in each iteration to achieve differential privacy. We denote the noisy vector or matrix are follows Laplace distribution

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call