Secure Mining of Association Rules in Distributed Datasets

Qilong Han,Dan Lu,Kejia Zhang,Haitao Zhang,Hongtao Song

doi:10.1109/access.2019.2948033

Abstract

The arrival of Information Age, with its rapid development of information technology, has provided a wide space for Data Analysis and Mining. Yet growth in this market could be held back by privacy concerns. This paper addresses the problem of secure association rule mining where transactions are distributed across sources. The existing solutions for distributed data (vertical partition and horizontal partition) have high complexity of encryption and incomplete definition of attributes of multiple parties. In this paper, we study how to maintain differential privacy in distributed databases for mining of association rules without revealing each party's raw transactions despite how strong background knowledge the attackers have. We use a intermediate server for data consolidation without assuming it is safe. Our methods offer enhanced privacy against various attacks model. In addition, it is simpler and is significantly more efficient in terms of communication rounds and computation overhead.

Highlights

Data mining, at its core, is the transformation of large amounts of data into meaningful patterns and rules
We study the mining of association rules in distributed datasets using the semi-trusted intermediate server
We propose an alternative protocol for the secure mining of association rules in distributed databases

Summary

INTRODUCTION

At its core, is the transformation of large amounts of data into meaningful patterns and rules. Han et al.: Secure Mining of Association Rules in Distributed Datasets multi-party computation In such problem, there are M participants that hold private transactions((x1, x2, . FRAMEWORK AND MINING ALGORITHMS we present the overall framework of secure mining of association rules in distributed datasets and proposed two discovery algorithm of getting 1-frequent itemsets safely under two attack models. In ARMS model, the Third-party do the same operations to send some initialized data to A, A adds laplace noise to real statistical results of each candidate 1-frequent item and use these noisy counts as the node value of noisy FP-tree, and sends noisy FP-tree to B. The Third-party can get the final statistics of each index but cannot know the real item represented by the index value

Algorithm Association Rules Mining Under HC

Findings

Privacy Analysis