Privacy preserving association rule mining over distributed databases using genetic algorithm

Bettahally N Keshavamurthy,Durga Toshniwal,Asad M Khan

doi:10.1007/s00521-013-1343-9

Abstract

Privacy preservation in distributed database is an active area of research. With the advancement of technology, massive amounts of data are continuously being collected and stored in distributed database applications. Indeed, temporal associations and correlations among items in large transactional datasets of distributed database can help in many business decision-making processes. One among them is mining frequent itemset and computing their association rules, which is a nontrivial issue. In a typical situation, multiple parties may wish to collaborate for extracting interesting global information such as frequent association, without revealing their respective data to each other. This may be particularly useful in applications such as retail market basket analysis, medical research, academic, etc. In the proposed work, we aim to find frequent items and to develop a global association rules model based on the genetic algorithm (GA). The GA is used due to its inherent features like robustness with respect to local maxima/minima and domain-independent nature for large space search technique to find exact or approximate solutions for optimization and search problems. For privacy preservation of the data, the concept of trusted third party with two offsets has been used. The data are first anonymized at local party end, and then, the aggregation and global association is done by the trusted third party. The proposed algorithms address various types of partitions such as horizontal, vertical, and arbitrary.

Full Text