Abstract

Association analysis is critical in data analysis performed to find all co-occurrence relationships ($i.e$ ., frequent itemsets or confident association rules) from the transactional dataset. An association rule can improve the ability of users to discover patterns and develop corresponding strategies. The data analysis process can be summarized as a set of queries, where each query is a real-valued function of the dataset. However, unless restrictions and protections are implemented, accessing the dataset to answer the queries may lead to the disclosure of the private information of individuals. In this paper, we propose an original differentially private association rules mining (DPARM) algorithm, which uses multiple support thresholds to reduce the number of candidate itemsets while reflecting the real nature of the items and uses random truncation and uniform partition to reduce the dimensionality of the dataset. Both of these elaborated approaches can aid in reducing the sensitivity of the queries, and this dramatically reduces the scale of the required noise and improves the utility of the mining results. We significantly stabilize the noise scale by adaptively allocating the privacy levels and bound the overall privacy loss. Through a series of experiments, we prove that our DPARM algorithm outperforms the literature in the accuracy of data mining while satisfying differential privacy. To the best of our knowledge, our work is the first DPARM algorithm to adopt multiple support thresholds while using a set of elaborated approaches to bound the overall privacy loss of the mining process.

Highlights

  • With the development of information technology and the popularity of smart devices, the volume of data generated by humans has substantially grown in scale

  • A tiny part of it is visible at first sight, while much of it is hidden beneath the surface.’’ Data analysis techniques, such as data mining and machine learning, are powerful tools for exploring the iceberg

  • Data mining and machine learning are two common data analysis techniques, but their application scenarios differ somewhat: Data mining emphasizes the discovery of useful knowledge, whereas machine learning focuses on predicting unknown entities on the basis of associations

Read more

Summary

INTRODUCTION

With the development of information technology and the popularity of smart devices, the volume of data generated by humans has substantially grown in scale. The dilemma described in the preceding paragraph is the ‘‘rare item problem.’’ In real-life applications, some items tend to naturally have more weights than other items Researchers have addressed this problem by allowing users to use ‘‘multiple support thresholds.’’ In brief, rare itemset mining is a more advanced setting of frequent itemset mining that allows user to apply different thresholds to each item. These efforts motivated us to develop a novel association rules algorithm to guarantee both privacy and utility by using multiple support thresholds.

PRELIMINARIES
COMPUTATIONAL MODEL
A DPARM algorithm mainly involves two steps:
DPARM ALGORITHM
Output
MIS ASSIGNMENT AND SUPPORT COUNTING
MISCELLANEOUS
PRIVACY ANALYSIS
Findings
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call