The process of frequent itemset mining (FIM) within large-scale databases plays a significant part in many knowledge discovery tasks, where, however, potential privacy breaches are possible. Privacy preserving frequent itemset mining (PPFIM) has thus drawn increasing attention recently, where the ultimate goal is to hide sensitive frequent itemsets (SFIs) so as to leave no confidential knowledge uncovered in the resulting database. Nevertheless, the vast majority of the proposed methods for PPFIM were merely based on database perturbation, which may result in a significant loss of data utility in order to conceal all SFIs. To alleviate this issue, this paper proposes a database reconstruction-based algorithm for PPFIM (DR-PPFIM) that can not only achieve a high degree of privacy but also afford a reasonable data utility. In DR-PPFIM, all SFIs with related frequent itemsets are first identified for removing in the pre-sanitize process by implementing a devised sanitize method. With the remained frequent itemsets, a novel database reconstruction scheme is proposed to reconstruct an appropriate database, where the concepts of inverse frequent itemset mining (IFIM) and database extension are efficiently integrated. In this way, all SFIs are able to be hidden under the same mining threshold while maximizing the data utility of the synthetic database as much as possible. Moreover, we also develop a further hiding strategy in DR-PPFIM to further decrease the significance of SFIs with the purpose of reducing the risk of disclosing confidential knowledge. Extensive comparative experiments are conducted on real databases to demonstrate the superiority of DR-PPFIM in terms of maximizing the utility of data and resisting potential threats.
Read full abstract