Frequent Itemset Mining Research Articles

High utility itemset mining is an important extension of frequent itemset mining which considers unit profits and quantities of items as external and internal utilities, respectively. Since the utility function has not downward closure property, an overestimated value of utility is obtained using an anti-monotonic upper bound of utility function to prune the search space and improve the efficiency of high utility itemset mining methods. Transaction-weighted utilization (TWU) of itemset was the first and one of the most important functions which has been used as the anti-monotonic upper bound of utility by various algorithms. A variety of high utility itemset mining methods have attempted to tighten the utility upper bound and have exploited appropriate pruning strategies to improve mining efficiency. Although TWU and its improved alternatives have attempted to increase the efficiency of high utility itemset mining methods by pruning their search spaces, they suffer from a significant number of generated candidates which are high-TWU but are not high utility itemsets. Calculating the actual utilities of low utility candidates needs to multiple scanning of the dataset and thus imposes a huge overhead to the mining methods, which can cause to lose the pruning benefits of the upper bounds. Proposing appropriate pruning strategies, exploiting efficient data structures, and using tight anti-monotonic upper bounds can overcome this problem and lead to significant performance improvement in high utility itemset mining methods. In this paper, a new projection-based method, called MAHI (matrix-aided high utility itemset mining), is introduced which uses a novel utility matrix-based pruning strategy, called MA-prune to improve the high utility itemset mining performance in terms of execution time. The experimental results show that MAHI is faster than former algorithms.

Read full abstract

Mining frequent itemsets from transactional data streams has become very essential in today's world with many applications such as stock market analysis, retail chain analysis, web log analysis, etc. Various algorithms have been proposed to efficiently mine single-port and multi-port transactional streams within the constraints of limited time and memory. However, all of them are budget algorithms, i.e., they are not capable of handling varying inter-arrival rate of transactions and high speed streams. They are constrained by a maximum limit to the inter-arrival rate of transactions, beyond which they fail to process. Also, these algorithms are not capable of giving immediate mining results, even with compromised accuracy if required. The above two properties characterize an anytime algorithm. We propose AnyFI, which is the first anytime frequent itemset mining algorithm for data streams. AnyFI uses a novel data structure - BFI-forest, which is capable of handling transactions arriving at variable rate. It maintains itemsets in BFI-forest in such a way that it can give a mining result almost immediately when the time allowance to mine is very less and can refine its accuracy with increase in time allowance. We also propose MPAnyFI which extends AnyFI into a parallel framework for anytime frequent itemset mining of multi-port data streams over commodity clusters. It uses AnyFI at each computing node of the cluster. Our extensive experimental analysis shows that AnyFI can handle high stream speeds close to 60,000 trans/sec with recall close to 100%. They also show the efficiency of MPAnyFI.

Read full abstract

Frequent Itemset Mining Research Articles

Related Topics

Articles published on Frequent Itemset Mining

DisCANTree: A Distributed Algorithm for Incremental Frequent Itemset Mining based on MapReduce

Directions of membrane separator development for microbial fuel cells: A retrospective analysis using frequent itemset mining and descriptive statistical approach

Distributed Frequent Itemset Mining Using Size Based Assignment Technique

SAT‐based and CP‐based declarative approaches for Top‐Rank‐ K closed frequent itemset mining

A Survey of High-utility Itemsets Mining

Enriching E Commerce Fraud Detection by using Machine Learning

Evaluation of Frequent Itemset Mining Algorithms-Apriori and FP Growth

Mining frequent weighted closed itemsets using the WN-list structure and an early pruning strategy

A Dynamic Sliding Window based Balanced Parallel Frequent Itemset Mining Algorithm in Data Stream

An efficient FP-Growth based association rule mining algorithm using Hadoop MapReduce

Discovering informative features in large-scale landmark image collection

Application of MapReduce parallel association mining on IDS in cloud computing environment

CL-MAX: a clustering-based approximation algorithm for mining maximal frequent itemsets

Incremental frequent itemsets mining based on frequent pattern tree and multi-scale

An efficient projection-based method for high utility itemset mining using a novel pruning approach on the utility matrix

A new approximate method for mining frequent itemsets from big data

Anytime Frequent Itemset Mining of Transactional Data Streams

Mining frequent itemsets from streaming transaction data using genetic algorithms

Toward fault-tolerant and secure frequent itemset mining outsourcing in hybrid cloud environment

Fast Dimensional Analysis for Root Cause Investigation in a Large-Scale Service Environment

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Frequent Itemset Mining Research Articles

Related Topics

Articles published on Frequent Itemset Mining

DisCANTree: A Distributed Algorithm for Incremental Frequent Itemset Mining based on MapReduce

Directions of membrane separator development for microbial fuel cells: A retrospective analysis using frequent itemset mining and descriptive statistical approach

Distributed Frequent Itemset Mining Using Size Based Assignment Technique

SAT‐based and CP‐based declarative approaches for Top‐Rank‐ K closed frequent itemset mining

A Survey of High-utility Itemsets Mining

Enriching E Commerce Fraud Detection by using Machine Learning

Evaluation of Frequent Itemset Mining Algorithms-Apriori and FP Growth

Mining frequent weighted closed itemsets using the WN-list structure and an early pruning strategy

A Dynamic Sliding Window based Balanced Parallel Frequent Itemset Mining Algorithm in Data Stream

An efficient FP-Growth based association rule mining algorithm using Hadoop MapReduce

Discovering informative features in large-scale landmark image collection

Application of MapReduce parallel association mining on IDS in cloud computing environment

CL-MAX: a clustering-based approximation algorithm for mining maximal frequent itemsets

Incremental frequent itemsets mining based on frequent pattern tree and multi-scale

An efficient projection-based method for high utility itemset mining using a novel pruning approach on the utility matrix

A new approximate method for mining frequent itemsets from big data

Anytime Frequent Itemset Mining of Transactional Data Streams

Mining frequent itemsets from streaming transaction data using genetic algorithms

Toward fault-tolerant and secure frequent itemset mining outsourcing in hybrid cloud environment

Fast Dimensional Analysis for Root Cause Investigation in a Large-Scale Service Environment