Abstract

Frequent itemsets mining discovers associations present among items from a large database. However, due to privacy concerns some sensitive frequent itemsets have to be hidden from the database before delivering it to the data miner. In this paper, we propose a greedy approach which provides an optimal solution for hiding frequent itemsets that are considered sensitive. The hiding process maximizes the utility of the modified database by introducing least possible amount of side effects. The algorithm employs a weighing scheme which computes transaction weight that allows it to select at each stage of iteration candidate transactions, based on side effects measurement. We investigated the effectiveness of proposed algorithm by comparing it with other heuristic algorithm using parameters such as number of sensitive frequent itemsets, length of sensitive frequent itemsets and minimum support on a number of datasets which are publicly available through the Frequent Itemset Mining (FIMI) repository. The experiment results demonstrated that our approach protects more non-sensitive frequent itemsets from being over-hidden than those produced by heuristic approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call