Abstract

Abstract Data analysis is expected to provide accurate descriptions of the data. However, this is in opposition to privacy requirements when working with sensitive data. In this case, there is a need to ensure that no disclosure of sensitive information takes place by releasing the data analysis results. Therefore, privacy-preserving data analysis has become significant. Enforcing strict privacy guarantees can significantly distort data or the results of the data analysis, thus limiting their analytical utility (i.e., differential privacy). In an attempt to address this issue, in this paper we discuss how “integral privacy”; a re-sampling based privacy model; can be used to compute descriptive statistics of a given dataset with high utility. In integral privacy, privacy is achieved through the notion of stability, which leads to release of the least susceptible data analysis result towards the changes in the input dataset. Here, stability is explained by the relative frequency of different generators (re-samples of data) that lead to the same data analysis results. In this work, we compare the results of integrally private statistics with respect to different theoretical data distributions and real world data with differing parameters. Moreover, the results are compared with statistics obtained through differential privacy. Finally, through empirical analysis, it is shown that the integral privacy based approach has high utility and robustness compared to differential privacy. Due to the computational complexity of the method we propose that integral privacy to be more suitable towards small datasets where differential privacy performs poorly. However, adopting an efficient re-sampling mechanism can further improve the computational efficiency in terms of integral privacy.

Highlights

  • We study how to apply integral privacy (IP) in order to compute descriptive statistics

  • This section is focused on evaluating the effectiveness of our approach when computing a set of descriptive statistics

  • In the case of mean() computation Differential privacy (DP) results have low variability compared to IP

Read more

Summary

Introduction

Privacy preserving data analysis has become a strong requirement with the use of sensitive data in data analysis. The privacy requirement remains such that no analysis done on sensitive data should lead to any disclosure of sensitive information. Several definitions of what privacy means have been introduced c The Author(s) 2019 C. Perez-Sola et al (Eds.): DPM 2019/CBT 2019, LNCS 11737, pp. They are computational definitions that permit us to build algorithms to provide solutions satisfying these privacy guarantees. Examples of such definitions include k-anonymity and differential privacy

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call