D-GENE: Deferring the GENEration of Power Sets for Discovering Frequent Itemsets in Sparse Big Data

Muhammad Yasir,Hamayoun Shahwani,Shahzad Sarwar,Mudassar Ahmad,Muhammad Ashraf,Muhammad Asif Habib,Muhammad Umar Chaudhry,Ch Muhammad Nadeem Faisal

doi:10.1109/access.2020.2971834

Abstract

Sparseness is the distinctive aspect of big data generated by numerous applications at present. Furthermore, several similar records exist in real-world sparse datasets. Based on Iterative Trimmed Transaction Lattice (ITTL), the recently proposed TRICE algorithm learns frequent itemsets efficiently from sparse datasets. TRICE stores alike transactions once, and eliminates the infrequent part of each distinct transaction afterward. However, removing the infrequent part of two or more distinct transactions may result in similar trimmed transactions. TRICE repeatedly generates ITTLs of similar trimmed transactions that induce redundant computations and eventually, affects the runtime efficiency. This paper presents D-GENE, a technique that optimizes TRICE by introducing a deferred ITTL generation mechanism. D-GENE suspends the process of ITTL generation till the completion of transaction pruning phase. The deferral strategy enables D-GENE to generate ITTLs of similar trimmed transactions once. Experimental results show that by avoiding the redundant computations, D-GENE gets better runtime efficiency. D-GENE beats TRICE, FP-growth, and optimized versions of SaM and RElim algorithms comprehensively, especially when the difference between distinct transactions and distinct trimmed transactions is high.

Highlights

In the realm of data science, association analysis has emerged as an unavoidable technique that explores strong relationships in voluminous databases
Association analysis is increasingly being deployed in numerous areas such as recommendation systems [2], study of market basket data [3], smart systems [4]–[7], IoT [8]–[10], fog and mobile edge computing [11], mining of data streams
Sparseness is the distinctive aspect of large real-world data generated by numerous sources, including pervasive computing, behavioral data, transactional data, and IoT applications, especially fog and mobile edge computing (MEC)

Summary

INTRODUCTION

In the realm of data science, association analysis has emerged as an unavoidable technique that explores strong relationships in voluminous databases. Efficient identification of frequent itemsets is still a vigorous research problem, even though numerous techniques have been proposed so far It explores collections of items placed jointly in a transactional database [1]. Sparseness is the distinctive aspect of large real-world data generated by numerous sources, including pervasive computing, behavioral data, transactional data, and IoT applications, especially fog and mobile edge computing (MEC). Techniques based on Iterative Transaction Lattice (ITL) are recently proposed to learn frequent itemsets from large real datasets that are sparse too [52], [53]. Based on the Iterative Trimmed Transaction Lattice (ITTL), the TRICE algorithm efficiently explores frequent itemsets from sparse real datasets [53]. A technique namely, Deferring the GENEration of Power sets for Mining Frequent Itemsets from Sparse Big data (D-GENE), is proposed in this paper.

RELATED WORK

D-GENE

Findings

VIII. CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

D-GENE: Deferring the GENEration of Power Sets for Discovering Frequent Itemsets in Sparse Big Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

TRICE: Mining Frequent Itemsets by Iterative TRimmed Transaction LattICE in Sparse Big Data
Muhammad Yasir ... Muhammad Umar Chaudhry
IEEE Access | VOL. 7
Muhammad Yasir, et. al.Muhammad Yasir ... Muhammad Umar Chaudhry
01 Jan 2019
IEEE Access | VOL. 7

Finding multiple global linear correlations in sparse and noisy data sets
Shunzhi Zhu ... Tao Li
Knowledge-Based Systems | VOL. 53
Shunzhi Zhu, et. al.Shunzhi Zhu ... Tao Li
30 Aug 2013
Knowledge-Based Systems | VOL. 53

Stratigraphic uncertainty in sparse versus rich data sets in a fluvial-deltaic outcrop analog: Ferron Notom delta in the Henry Mountains region, southern Utah
Weiguo Li ... Janok P Bhattacharya
AAPG Bulletin | VOL. 96
Weiguo Li, et. al.Weiguo Li ... Janok P Bhattacharya
01 Mar 2012
AAPG Bulletin | VOL. 96

Using sparse photometric data sets for asteroid lightcurve studies
Brian D Warner ... Alan W Harris
Icarus | VOL. 216
Brian D Warner, et. al.Brian D Warner ... Alan W Harris
20 Oct 2011
Icarus | VOL. 216

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

D-GENE: Deferring the GENEration of Power Sets for Discovering Frequent Itemsets in Sparse Big Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access