Data heterogeneity's impact on the performance of frequent itemset mining algorithms

Antonio Manuel Trasierras,José María Luna,Philippe Fournier-Viger,Sebastián Ventura

doi:10.1016/j.ins.2024.120981

Abstract

Frequent itemset mining (FIM) is a widely used task that extracts frequently occurring itemsets from data. Plenty of deterministic algorithms are available for this daunting task. However, experimental studies have not considered that data heterogeneity significantly impacts the algorithms' performance, giving rise to unfair comparisons and biased conclusions. This paper seeks to advance by comparing cutting-edge algorithms using various frequency thresholds, considering the resulting data heterogeneity. An extensive experimental study is carried out, including the number of itemsets mined per second as the performance quality measure to compare algorithms. The experiments include defining eight metrics to quantify data heterogeneity, and their values vary the algorithms' performance. The results revealed that some techniques (hypercube decomposition and k-items machine) are essential to achieve excellent performance on any dataset, and most algorithms behave similarly well when they include those techniques. As a final important point, different threshold values produce dissimilar data subsets (data heterogeneity is not an immutable data characteristic), so a previous study on the database characteristics with a few minimum support thresholds could be beneficial to select the best-suited FIM algorithm beforehand.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information Sciences	Publication Date: Jun 11, 2024
Citations: 1	License type: cc-by-nc

R Discovery Prime

R Discovery Prime

Data heterogeneity's impact on the performance of frequent itemset mining algorithms

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Similar Papers

A review on support threshold free frequent itemsets mining approaches
Saif-Ur-Rehman ... M Ahsan
-
Saif-Ur-Rehman, et. al. Saif-Ur-Rehman ... M Ahsan
01 Dec 2016
01 Dec 2016

Revised ECLAT Algorithm for Frequent Itemset Mining
Bharati Suvalka ... Sarika Khandelwal
-
Bharati Suvalka, et. al.Bharati Suvalka ... Sarika Khandelwal
01 Jan 2015
01 Jan 2015

A Fast & Memory Efficient Technique for Mining Frequent Item Sets from a Data Set
Richa Mathur ... Virendra Kumar
IOSR Journal of Computer Engineering | VOL. 16
Richa Mathur, et. al.Richa Mathur ... Virendra Kumar
01 Jan 2014
IOSR Journal of Computer Engineering | VOL. 16

Metaheuristics for Frequent and High-Utility Itemset Mining
Youcef Djenouri ... Asma Belhadi
-
Youcef Djenouri, et. al.Youcef Djenouri ... Asma Belhadi
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data heterogeneity's impact on the performance of frequent itemset mining algorithms

Abstract

Talk to us

Similar Papers

More From: Information Sciences