Abstract
Due to a growing demand for efficient algorithms for mining frequent itemsets from uncertain databases, several approaches have been proposed in recent years, but all of them use support-based constraints to prune the combinatorial search space. Most real life databases contain data whose correctness is uncertain. The support-based constraint alone is not enough, because the frequent itemsets may have weak affinity. Even a very high minimum support is not effective for finding correlated patterns with increased weight or support affinity. There are a few approaches in precise databases that propose new measures to mine correlated patterns, but they are not applicable in uncertain databases because certain and uncertain databases differ both semantically and computationally. In this paper, we propose a new strategy: Weighted Uncertain Interesting Pattern Mining (WUIPM), in which a tree structure (WUIP-tree) and several new measures (e.g., uConf, wUConf) are suggested to mine correlated patterns from uncertain databases. To our knowledge, ours is the first work specifically to consider weight or importance of an individual item alongside correlation between items of patterns in uncertain databases. Additionally, we propose a new metric, prefix proxy value, pProxy for our WUIP-tree that helps improve the mining performance. A comprehensive performance study shows that our strategy (a) generates fewer but valuable patterns and (b) is faster than existing approaches even when affinity measures are not applied.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.