Abstract

Data mining is used to mine meaningful and useful information or knowledge from a very large database. Some secure or private information can be discovered by data mining techniques, thus resulting in an inherent risk of threats to privacy. Privacy-preserving data mining (PPDM) has thus arisen in recent years to sanitize the original database for hiding sensitive information, which can be concerned as an NP-hard problem in sanitization process. In this paper, a compact prelarge GA-based (cpGA2DT) algorithm to delete transactions for hiding sensitive itemsets is thus proposed. It solves the limitations of the evolutionary process by adopting both the compact GA-based (cGA) mechanism and the prelarge concept. A flexible fitness function with three adjustable weights is thus designed to find the appropriate transactions to be deleted in order to hide sensitive itemsets with minimal side effects of hiding failure, missing cost, and artificial cost. Experiments are conducted to show the performance of the proposed cpGA2DT algorithm compared to the simple GA-based (sGA2DT) algorithm and the greedy approach in terms of execution time and three side effects.

Highlights

  • With the rapid growth of data mining technologies in recent years, useful and meaningful information can be discovered for the purpose of decision making in different domains

  • Experiments are conducted to show the performance of the proposed cpGA2DT, which was performed on a Pentium IV processor at 2 GHz and 512 M of RAM running on the Mandriva platform

  • A compact Genetic algorithms (GAs)-based cpGA2DT algorithm is proposed to hide the sensitive itemsets through transaction deletion

Read more

Summary

Introduction

With the rapid growth of data mining technologies in recent years, useful and meaningful information can be discovered for the purpose of decision making in different domains. Privacy-preserving data mining (PPDM) [19,20,21,22] was proposed to reduce privacy threats by hiding sensitive information while allowing required information to be discovered from databases. Such data may implicitly contain confidential information that will lead to privacy threats if it is misused. A GA-based approach is proposed to optimize the selected transactions to be deleted, minimizing the side effects in PPDM. (2) It requires the amount of memory in evaluation process based on traditional GA approach In this proposed approach, cGA is applied to reduce the population size based on probability distribution to select the appropriate transactions to be deleted.

Review of Related Works
Preliminaries
An Illustrated Example
Experimental Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call