Abstract
High-utility itemset mining (HUIM) is an extension of traditional frequent itemset mining, which considers both quantities and unit profits of items in a database to reveal highly profitable itemsets regardless of their size. High average-utility itemset mining (HAUIM) is designed to find average-utility itemsets by considering both their utility and the number of items that they contain. Thus, average-utility itemsets are obtained based on a fair utility measurement since the average utility typically does not increase much with the size of itemsets. However, most algorithms for discovering high average utility itemsets are designed to extract patterns from a static database. If the size of a database decreases or increases over time (e.g., as a result of transaction insertions), the database must be scanned again in batch mode to update the results. Thus, previously discovered knowledge is ignored and the time previously spent for pattern extraction is wasted. We thus present an incremental HAUIM algorithm for transaction insertion (FUP-HAUIMI) to maintain information about patterns when a database is updated, based on the FUP concept. An average-utility-list (AUL)-structure is first built by scanning the original database. Then, FUP-HAUIMI selects high average-utility upper-bound itemsets and categorizes them according to four cases. For each case, itemsets are maintained and updated using a specific updating procedure. While traversing the enumeration tree representing the search space in a depth-first way, a join operation is performed to quickly and incrementally update the AUL-structures. Several experiments were carried to evaluate the runtime, memory usage, number of potential patterns (candidates), and the scalability of the designed approach. Results show that the performance of FUP-HAUIMI is excellent compared to the state-of-the-art HAUI-Miner algorithm running in batch mode and the state-of-the-art incremental high-utility pattern mining (IHAUPM) algorithm for incremental average-utility pattern mining.
Highlights
Mining useful or meaningful information is a major KDD (Knowledge Discovery in Database) task, which has been widely considered as interesting and useful for more than two decades
Experiments show that the designed FUP-HAUIMI algorithm has better performance to maintain and update the discovered HAUIs than that of the state-of-the-art HAUI-Miner algorithm running in batch mode and the state-of-the-art incremental IHAUPM algorithm
When some transactions are inserted into the original database, the designed FUP-HAUIMI algorithm first divides the high average-utility upper bound itemset (HAUUBI) into four cases, and the itemsets of each case are respectively, maintained and updated by the designed procedures
Summary
Mining useful or meaningful information is a major KDD (Knowledge Discovery in Database) task, which has been widely considered as interesting and useful for more than two decades. A fundamental algorithm named Apriori [1] was first designed to mine association rules (ARs) It discovers patterns in a level-wise way in a static database. An Apriori-like approach was first designed that considers the length (size) of each itemset It calculates the average-utility of each itemset instead of its utility as in HUIM, which provides a flexible way of measuring the importance of itemsets for decision-making. The auub (Average-Utility Upper Bound) model [20] was presented to obtain a downward closure property by maintaining the HAUUBIs (High Average-Utility Upper-Bound Itemsets), reducing the search space to discover the set of HAUIs. Lin et al [21] developed an efficient HAUP-tree The AUL-structure is utilized in the designed algorithm to efficiently keep information for mining patterns and incrementally updating results. Experiments show that the designed FUP-HAUIMI algorithm has better performance to maintain and update the discovered HAUIs than that of the state-of-the-art HAUI-Miner algorithm running in batch mode and the state-of-the-art incremental IHAUPM algorithm
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.