Maximal Itemsets Research Articles

Finding the most interesting correlations among items is essential for problems in many commercial, medical, and scientific domains. Our previous work on the maximal fully-correlated itemset (MFCI) framework can rule out the itemsets with irrelevant items and its downward-closed property helps to achieve good computational performance. However, to calculate the desired MFCIs in large databases, there are still two computational issues. First, unlike finding maximal frequent itemsets which can start the pruning from 1-itemsets, finding MFCIs must start the pruning from 2-itemsets. When the number of items in a given dataset is large and the support of all the pairs cannot be loaded into the memory, the IO cost (\(O(n^2)\)) for calculating correlation of all the pairs can be very high. Second, users usually need to try different correlation thresholds for different desirable MFCIs. Therefore, the cost of processing the Apriori procedure each time for a different correlation threshold is also very high. Consequently, we proposed two techniques to solve these problems. First, we identify the correlation upper bound for any good correlation measure to avoid unnecessary IO query for the support of pairs, and make use of their common monotone property to prune many pairs even without computing their correlation upper bounds. In addition, we build an enumeration tree to save the fully-correlated value for all the MFCIs under a given initial correlation threshold. We can either efficiently retrieve the desired MFCIs for any given threshold above the initial threshold or incrementally grow the tree if the given threshold is below the initial threshold. Experimental results show that our algorithm can be an order of magnitude faster than the original MFCI algorithm.

Real world applications of association rule mining have well-known problems of discovering a large number of rules, many of which are not interesting or useful for the application at hand. The algorithms for closed and maximal itemsets mining significantly reduce the volume of rules discovered and complexity associated with the task, but the implications of their use and important differences with respect to the generalization power, precision and recall when used in the classification problem have not been examined. In this paper, we present a systematic evaluation of the association rules discovered from frequent, closed and maximal itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriate sequence of usage. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided as a whole and w.r.t individual classes. Empirical results confirm that with a proper combination of data mining and statistical analysis, a large number of non-significant, redundant and contradictive rules can be eliminated while preserving relatively high precision and recall. More importantly, the results reveal the important characteristics and differences between using frequent, closed and maximal itemsets for the classification task, and the effect of incorporating statistical/heuristic measures for optimizing such rule sets. With closed itemset mining already being a preferred choice for complexity and redundancy reduction during rule generation, this study has further confirmed that overall closed itemset based association rules are also of better quality in terms of classification precision and recall, and precision and recall on individual class examples. On the other hand maximal itemset based association rules, that are a subset of closed itemset based rules, show to be insufficient in this regard, and typically will have worse recall and generalization power. Empirical results also show the downfall of using the confidence measure at the start to generate association rules, as typically done within the association rule framework. Removing rules that occur below a certain confidence threshold, will also remove the knowledge of existence of any contradictions in the data to the relatively higher confidence rules, and thus precision can be increased by disregarding contradictive rules prior to application of confidence constraint.

Maximal Itemsets Research Articles

Related Topics

Articles published on Maximal Itemsets

A New Algorithm for Extracting Textual Maximal Frequent Itemsets from Arabic Documents

Classification of Arabic Documents depending on Maximal Frequent Itemsets

Deriving Frequent Itemsets from Lossless Condensed Representation

PENERAPAN ALGORITMA MAX-MINER UNTUK ANALISIS POLA BELANJA KONSUMEN (STUDI KASUS KAFELOAJA)

Reference itemsets: useful itemsets to approximate the representation of frequent itemsets

The lattice‐based approaches for mining association rules: a review

A distributed maximal frequent itemset mining with multi agents system on bitmap join indexes selection

Speeding up maximal fully-correlated itemsets search in large databases

BAHUI

Evaluation and optimization of frequent, closed and maximal association rule based classification

Frequent item set mining

High utility pattern mining using the maximal itemset property and lexicographic tree structures

Efficient Mining Algorithms of Finding Frequent Datasets

Maximal Frequent Itemset Generation Using Segmentation Apporach

Generalized association rule mining using an efficient data structure

Meta itemset: a new concise representation of frequent itemset

Distributed Mining of Maximal Frequent Itemsets on a Data Grid System

Generalized domination in closure systems

An Efficient Graph-Based Method for Parallel Mining Problems

Design and Implementation of a Generator of Large , Dense ,or Sparse Databases to Test Association Rules Miner

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Maximal Itemsets Research Articles

Related Topics

Articles published on Maximal Itemsets

A New Algorithm for Extracting Textual Maximal Frequent Itemsets from Arabic Documents

Classification of Arabic Documents depending on Maximal Frequent Itemsets

Deriving Frequent Itemsets from Lossless Condensed Representation

PENERAPAN ALGORITMA MAX-MINER UNTUK ANALISIS POLA BELANJA KONSUMEN (STUDI KASUS KAFELOAJA)

Reference itemsets: useful itemsets to approximate the representation of frequent itemsets

The lattice‐based approaches for mining association rules: a review

A distributed maximal frequent itemset mining with multi agents system on bitmap join indexes selection

Speeding up maximal fully-correlated itemsets search in large databases

BAHUI

Evaluation and optimization of frequent, closed and maximal association rule based classification

Frequent item set mining

High utility pattern mining using the maximal itemset property and lexicographic tree structures

Efficient Mining Algorithms of Finding Frequent Datasets

Maximal Frequent Itemset Generation Using Segmentation Apporach

Generalized association rule mining using an efficient data structure

Meta itemset: a new concise representation of frequent itemset

Distributed Mining of Maximal Frequent Itemsets on a Data Grid System

Generalized domination in closure systems

An Efficient Graph-Based Method for Parallel Mining Problems

Design and Implementation of a Generator of Large , Dense ,or Sparse Databases to Test Association Rules Miner