Abstract
We propose here an efficient data mining algorithm to sanitize informative association rules when the database is updated, i.e., when a new data set is added to the original database. For a given predicting item, an informative association rule set [Li, J., Shen, Hong, & Topor, R. (2001). Mining the smallest association rule set for predictions. In Proceedings of the 2001 IEEE international conference on data mining (pp. 361–368)] is the smallest association rule set that makes the same prediction as the entire association rule set by confidence priority. Several approaches to sanitize informative association rules from static databases have been proposed [Wang, S. L., Parikh, B., & Jafari, A. (2007). Hiding informative association rule sets. Expert Systems with Applications, 33(2), 316–323 and Wang, S. L., Maskey, R., Jafari, A., & Hong, T. P. (2007). Efficient sanitization of informative association rules. Expert Systems with Applications. doi: 10.1016/j.eswa.2007.07.039]. However, frequent updates to the database may require repeated sanitization of original database and added data sets. The efforts of previous sanitization are not utilized in these approaches. In this work, we propose using pattern inversion tree to store the added data set in one database scan. It is then sanitized and merged to the original sanitized database. Various characteristics of the proposed algorithm are analyzed. Numerical experiments and running time analyses show that the proposed approach out performs the direct sanitization approach on original and added data sets, with similar side effects.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.