Abstract

With the rapid development of information technology, large-scale personal data, including those collected by sensors or IoT devices, is stored in the cloud or data centers. In some cases, the owners of the cloud or data centers need to publish the data. Therefore, how to make the best use of the data in the risk of personal information leakage has become a popular research topic. The most common method of data privacy protection is the data anonymization, which has two main problems: (1) The availability of information after clustering will be reduced, and it cannot be flexibly adjusted. (2) Most methods are static. When the data is released multiple times, it will cause personal privacy leakage. To solve the problems, this article has two contributions. The first one is to propose a new method based on micro-aggregation to complete the process of clustering. In this way, the data availability and the privacy protection can be adjusted flexibly by considering the concepts of distance and information entropy. The second contribution of this article is to propose a dynamic update mechanism that guarantees that the individual privacy is not compromised after the data has been subjected to multiple releases, and minimizes the loss of information. At the end of the article, the algorithm is simulated with real data sets. The availability and advantages of the method are demonstrated by calculating the time, the average information loss and the number of forged data.

Highlights

  • As the Internet and Big Data have been rapidly developing ample concern focusing on protecting personal privacy, which has become one of the most popular research areas, is on the rise

  • A Laplace noise mechanism is applied to protect the sensitive attributes of result set

  • L-diversity based on the k-anonymity deals with the sensitive attributes, and makes sure that there are L attributes in an equivalent group at least, but if the L attributes almost concentrating on some certain attributes, attackers can infer the result of the personal privacy in high probability, so the theory which is called L-diversity P-sensitive [23,24] comes into being

Read more

Summary

Introduction

As the Internet and Big Data have been rapidly developing ample concern focusing on protecting personal privacy, which has become one of the most popular research areas, is on the rise. A potential hacker cannot deduct the accurate personal sensitive data to achieve privacy protection The key of such methods, of generalization, is the development of aggregation (clustering) [7,8] rules, which should be designed in the way that guarantees the usability while protecting client privacy. Some hospitals would make diagnosis records (left part of Table 1) publicly available to the related scientific research organizations In this example, it is assumed that every piece of individual data shown is presented in the form of a tuple (one row of the table) and has a fixed structure: including Name, Age, Zip Code, and Disease Type. In terms of dynamic generalization methods—for instance, M-invariance—forged data is added to assure the unification of multiple publication

Related Work
Clustering Model Based on Micro Aggregation
The Measurement of Tuples ‘Attribute Distance’
The Measurement of Sensitive Attribute Entropy
Micro Aggregation Clustering Algorithm Description
Dynamic Update Based on Micro Aggregation
Dynamic Adjustment after Micro-Aggregation Clustering
14. End While
Dynamic Protection of Sensitive Attributes
The Laplace Noise Mechanism
Experiment and Result Analysis
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call