Itemset Mining Research Articles

Traditional approaches to data mining are generally designed for small, centralized, and static datasets. However, when a dataset grows at an enormous rate, the algorithms become infeasible in terms of huge consumption of computational and I/O resources. Frequent itemset mining (FIM) is one of the key algorithms in data mining and finds applications in a variety of domains; however, traditional algorithms do face problems in efficiently processing large and dynamic datasets. This research introduces a distributed incremental approximation frequent itemset mining (DIAFM) algorithm that tackles the mentioned challenges using shard-based approximation within the MapReduce framework. DIAFM minimizes the computational overhead of a program by reducing dataset scans, bypassing exact support checks, and incorporating shard-level error thresholds for an appropriate trade-off between efficiency and accuracy. Extensive experiments have demonstrated that DIAFM reduces runtime by 40–60% compared to traditional methods with losses in accuracy within 1–5%, even for datasets over 500,000 transactions. Its incremental nature ensures that new data increments are handled efficiently without needing to reprocess the entire dataset, making it particularly suitable for real-time, large-scale applications such as transaction analysis and IoT data streams. These results demonstrate the scalability, robustness, and practical applicability of DIAFM and establish it as a competitive and efficient solution for mining frequent itemsets in distributed, dynamic environments.

Read full abstract

In pattern mining, high-utility itemset mining (HUIM) is useful for discovering high-utility patterns. The study of HUIM using heuristic techniques reflects issues in producing better offspring. It is ineffective in terms of search space organization, population diversity, and utility calculation, which impact runtime and accuracy. It is observed that very few researchers have experimented with genetic algorithm (GA) and are still facing the same issues as mentioned before. To overcome these problems, a novel approach is proposed for HUIM using modified GA and optimized local search (HUIM-MGALS) with six potential contributions. First is linking the utility with the Bitmap dataset to reduce utility access time, leading to effective search space organization. Second, HUIM-MGALS employs a fitness scaling strategy to avoid redundancy. Third, a high-utility itemset (HUI) revision strategy is employed to explore significant HUIs. Modified population diversity maintenance strategy and iterative crossover help to preserve significant HUIs and improve search capability as fourth and fifth contributions. Sixth, the use of multiple mutations refines the wasted individuals to boost accuracy. Extensive experimentation showed that HUIM-MGALS significantly outperforms the presented algorithms, up to 8.6 times faster. It also demonstrates superior HUI discovery capabilities for both sparse and dense datasets. This is supported by the modified population diversity maintenance strategy, which is proved to be the most impactful modification for HUI discovery in HUIM-MGALS.

Read full abstract

Itemset Mining Research Articles

Related Topics

Articles published on Itemset Mining

Multi-level high utility-itemset hiding.

Dynamic Association Mining Techniques for the Faster Extraction of High Utility Itemsets from Incremental Databases

Damped weighted erasable itemset mining with time sensitive dynamic environments

An efficient PSO-based evolutionary model for closed high-utility itemset mining

Secure Two-Party Frequent Itemset Mining With Guaranteeing Differential Privacy

A Spatial-Temporal Exploration of Coordination Failures Preceding Coal Mine Explosion Accidents in China

A hierarchical set-enumeration tree enabling high occupancy item set mining and the use of an adaptive occupancy threshold

DIAFM: An Improved and Novel Approach for Incremental Frequent Itemset Mining

Re-induction based mining for high utility item-sets

High utility itemset mining in data stream using elephant herding optimization

Optimization of frequent item set mining parallelization algorithm based on spark platform

Optimized Privacy Preserved Itemset Mining using Federated Learning from Transactional Data in Data Mining

Optimal Keyword Selection by Hybrid Optimization with Itemset Mining for Text Summarization in Biomedical Sector

Retraction Note: High utility itemset mining: a Boolean operators-based modified grey wolf optimization algorithm

Modified Genetic Algorithm for Efficient High-Utility Itemset Mining

Enhancing frequent itemset mining through machine learning and nature-inspired algorithms: a comprehensive review

IPHM: Incremental periodic high-utility mining algorithm in dynamic and evolving data environments

An Improved Apriori Algorithm Based on the Spark Platform

Effective approaches for mining correlated and low-average-cost patterns

Product Layout Analysis Based on Consumer Purchasing Patterns Using Apriori Algorithm

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Itemset Mining Research Articles

Related Topics

Articles published on Itemset Mining

Multi-level high utility-itemset hiding.

Dynamic Association Mining Techniques for the Faster Extraction of High Utility Itemsets from Incremental Databases

Damped weighted erasable itemset mining with time sensitive dynamic environments

An efficient PSO-based evolutionary model for closed high-utility itemset mining

Secure Two-Party Frequent Itemset Mining With Guaranteeing Differential Privacy

A Spatial-Temporal Exploration of Coordination Failures Preceding Coal Mine Explosion Accidents in China

A hierarchical set-enumeration tree enabling high occupancy item set mining and the use of an adaptive occupancy threshold

DIAFM: An Improved and Novel Approach for Incremental Frequent Itemset Mining

Re-induction based mining for high utility item-sets

High utility itemset mining in data stream using elephant herding optimization

Optimization of frequent item set mining parallelization algorithm based on spark platform

Optimized Privacy Preserved Itemset Mining using Federated Learning from Transactional Data in Data Mining

Optimal Keyword Selection by Hybrid Optimization with Itemset Mining for Text Summarization in Biomedical Sector

Retraction Note: High utility itemset mining: a Boolean operators-based modified grey wolf optimization algorithm

Modified Genetic Algorithm for Efficient High-Utility Itemset Mining

Enhancing frequent itemset mining through machine learning and nature-inspired algorithms: a comprehensive review

IPHM: Incremental periodic high-utility mining algorithm in dynamic and evolving data environments

An Improved Apriori Algorithm Based on the Spark Platform

Effective approaches for mining correlated and low-average-cost patterns

Product Layout Analysis Based on Consumer Purchasing Patterns Using Apriori Algorithm