Efficient Mining of Weighted Frequent Itemsets in Uncertain Databases

Jerry Chun-Wei Lin,Wensheng Gan,Tzung-Pei Hong,Philippe Fournier-Viger

doi:10.1007/978-3-319-41920-6_18

Abstract

Frequent itemset mining (FIM) is a fundamental set of techniques used to discover useful and meaningful relationships between items in transaction databases. Recently, extensions of FIM such as weighted frequent itemset mining (WFIM) and frequent itemset mining in uncertain databases (UFIM) have been proposed. WFIM considers that items may have different weight/importance, and the UFIM takes into account that data collected in a real-life environment may often be inaccurate, imprecise, or incomplete. Recently, a two-phase Apriori-based approach called HEWI-Uapriori was proposed to consider both item weight and uncertainty to mine the high expected weighted itemsets (HEWIs), while it generates a large amount of candidates and is too time-consuming. In this paper, a more efficient algorithm named HEWI-Utree is developed to efficiently mine HEWIs without performing multiple database scans and without generating enormous candidates. It relies on three novel structures named element (E)-table, weighted-probability (WP)-table and WP-tree to maintain the information required for identifying and pruning unpromising itemsets early. Experimental results show that the proposed algorithm is efficient than traditional methods of WFIM and UFIM, as well as the HEWI-Uapriori algorithm, in terms of runtime, memory usage, and scalability.

Full Text