The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion

Chun-Wei Lin,Kuo-Tung Yang,Shyue-Liang Wang,Tzung-Pei Hong

doi:10.1007/s10489-014-0590-5

Abstract

Data mining technology is used to extract useful knowledge from very large datasets, but the process of data collection and data dissemination may result in an inherent threat to privacy. Some sensitive or private information concerning individuals, businesses and organizations has to be suppressed before it is shared or published. Privacy-preserving data mining (PPDM) has become an important issue in recent years. In the past, many heuristic approaches were developed to sanitize databases for the purpose of hiding sensitive information in PPDM, but data sanitization of PPDM is considered to be an NP-hard problem. It is critical to find the balance between privacy protection for hiding sensitive information and maintaining the discovery of knowledge, or even reducing artificial knowledge in the sanitization process. In this paper, a GA-based framework with two optimization algorithms is proposed for data sanitization. A novel evaluation function with three concerned factors is designed to find the appropriate transactions to be deleted in order to hide sensitive itemsets. Experiments are then conducted to evaluate the performance of the proposed GA-based algorithms with regard to different factors such as the execution time, the number of hiding failures, the number of missing itemsets, the number of artificial itemsets, and database dissimilarity.

Full Text