Abstract

Interesting itemset mining is a fundamental research problem in knowledge management and machine learning. It is intended to identify interesting relations between variables in a database using some measures of interestingness and has a number of applications, including market basket analysis, web usage mining, intrusion detection, and many others. This paper proposes a new interestingness measure, the fault-tolerant tile. That is based on two observations: (1) the length of an itemset can be as important as its frequency; (2) knowledge discovery from real-world datasets calls for fault-tolerant data mining (e.g. extracting fault-tolerant association rules, analyzing noisy datasets). Given a user-defined fault tolerance value, we are interested in finding the maximum/top-k fault-tolerant tiles. Due to the exponential search space of candidate itemsets, both problems are NP-hard. While using some monotonic property to prune search space is a common strategy for interesting itemset mining, no monotonic property is available for this problem. To tackle the challenge, we utilize the branch-and-bound search strategy to analyze the characteristics of candidate itemsets at each searching branch and estimating their bounds. Our experimental results show that our algorithms can effectively analyze real datasets and retrieve meaningful results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.