Privacy-preserving data mining, is a novel research direction in data mining and statistical databases, where data mining algorithms are analyzed for the side effects they incur in data privacy [Verykios, V., Bertino, E., Fovino, I. G., Provenza, L. P., Saygin, Y., & Theodoridis, Y. (2004). State-of-the-art in privacy preserving data mining. SIGMOD Record 33 (1), 50–57, March 2004]. For example, through data mining, one is able to infer sensitive information, including personal information or even patterns, from non-sensitive information or unclassified data. There have been two types of privacy concerning data mining. The first type of privacy, called output privacy, is that the data is minimally altered so that the mining result will not disclose certain privacy. The second type of privacy, called input privacy, is that the data is manipulated so that the mining result is not affected or minimally affected. In output privacy of hiding association rules, current approaches require hidden rules or patterns been given in advance [Dasseni, E., Verykios, V., Elmagarmid, A., & Bertino, E. (2001). Hiding association rules by using confidence and support. In Proceedings of 4th information hiding workshop, Pittsburgh, PA (pp. 369–383); Oliveira, S., & Zaiane, O. (2002). Privacy preserving frequent itemset mining. In Proceedings of IEEE international conference on data mining, November (pp. 43–54); Oliveira, S., & Zaiane, O. (2003). Algorithms for balancing privacy and knowledge discovery in association rule mining. In Proceedings of 7th international database engineering and applications symposium (IDEAS03), Hong Kong, July ; Oliveira, S., & Zaiane, O. (2003). Protecting sensitive knowledge by data sanitization. In Proceedings of IEEE international conference on data mining, November 2003 ; Saygin, Y., Verykios, V., & Clifton, C. (2001). Using unknowns to prevent discovery of association rules. SIGMOND Record 30 (4), 45–54; Verykios, V., Elmagarmid, A., Bertino, E., Saygin, Y., & Dasseni, E. (2004). Association rules hiding. IEEE Transactions on Knowledge and Data Engineering 16 (4), 434–447]. This selection of rules would require data mining process to be executed first. Based on the discovered rules and privacy requirements, hidden rules or patterns are then selected manually. However, for some applications, we are interested in hiding certain constrained classes of association rules such as informative association rule sets [Li J., Shen H., & Topor R. (2001). Mining the smallest association rule set for predictions. In Proceedings of the 2001 IEEE international conference on data mining (pp. 361–368)]. To hide such rule sets, the pre-process of finding these hidden rules can be integrated into the hiding process as long as the predicting items are given. In this work, we propose two algorithms, ISL (Increase Support of LHS) and DSR (Decrease Support of RHS), to automatically hiding informative association rule sets without pre-mining and selection of hidden rules. Examples illustrating the proposed algorithms are given. Numerical experiments are performed to show the various effects of the algorithms. Recommendations of appropriate usage of the proposed algorithms based on the characteristics of databases are reported.
Read full abstract