Abstract

Sparseness is often witnessed in big data emanating from a variety of sources, including IoT, pervasive computing, and behavioral data. Frequent itemset mining is the first and foremost step of association rule mining, which is a distinguished unsupervised machine learning problem. However, techniques for frequent itemset mining are least explored for sparse real-world data, showing somewhat comparable performance. On the contrary, the methods are adequately validated for dense data and stand apart from each other in terms of performance. Hence, there arises an immense need for evaluating these techniques as well as proposing new ones for large sparse real-world datasets. In this study, a novel method: Mining Frequent Itemsets by Iterative TRimmed Transaction lattICE (TRICE) is proposed. TRICE iteratively generates combinations of varying-sized trimmed subsets of $I$ , where $I$ denote the set of distinct items in a database. Extensive experiments are conducted to assess TRICE against HARPP, FP-Growth, optimized SaM, and optimized RElim algorithms. The experimental results show that TRICE outperforms all these algorithms both in terms of running time and memory consumption. TRICE maintains a substantial performance gap for all sparse real-world datasets on all minimum support thresholds. Moreover, assessment of memory use of optimized SaM and RElim algorithms has been performed for the first time.

Highlights

  • The mining of association rules is regarded as one of the leading problems in data mining

  • Sparse real-world data is found to be extremely useful for companies, and its inclusion leads to improved predictive analytics

  • This paper presents Transaction lattICE (TRICE), a novel algorithm to mine frequent itemsets from real-world sparse datasets

Read more

Summary

INTRODUCTION

The mining of association rules is regarded as one of the leading problems in data mining. Frequent itemset mining holds a distinguished stature in data science to generate association rules, episodes, and correlations [24] It finds collections of items placed collectively in a database of transactions [1]. A novel method, Mining frequent itemsets by Iterative TRimmed Transaction lattICE (TRICE), is proposed to dig out frequent itemsets from sparse real-world transactional datasets efficiently. TRICE has optimized the HARPP (HARnessing the Power of Powersets for Mining Frequent Itemsets) algorithm [33] by getting rid of its memory exhaustiveness for the datasets having longer average transaction length. TRICE achieves efficiency and conserves memory due to its elegant transaction trimming mechanism as well as treating identical transactions It is compared with HARPP, FP-Growth, and optimized SaM (Split and Merge) and RElim (Recursive Elimination) algorithms on six real-world sparse datasets.

RELATED WORK
TRICE EXAMPLE
EXPERIMENTAL EVALUATION OF TRICE
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call