Abstract

Sparseness is the distinctive aspect of big data generated by numerous applications at present. Furthermore, several similar records exist in real-world sparse datasets. Based on Iterative Trimmed Transaction Lattice (ITTL), the recently proposed TRICE algorithm learns frequent itemsets efficiently from sparse datasets. TRICE stores alike transactions once, and eliminates the infrequent part of each distinct transaction afterward. However, removing the infrequent part of two or more distinct transactions may result in similar trimmed transactions. TRICE repeatedly generates ITTLs of similar trimmed transactions that induce redundant computations and eventually, affects the runtime efficiency. This paper presents D-GENE, a technique that optimizes TRICE by introducing a deferred ITTL generation mechanism. D-GENE suspends the process of ITTL generation till the completion of transaction pruning phase. The deferral strategy enables D-GENE to generate ITTLs of similar trimmed transactions once. Experimental results show that by avoiding the redundant computations, D-GENE gets better runtime efficiency. D-GENE beats TRICE, FP-growth, and optimized versions of SaM and RElim algorithms comprehensively, especially when the difference between distinct transactions and distinct trimmed transactions is high.

Highlights

  • In the realm of data science, association analysis has emerged as an unavoidable technique that explores strong relationships in voluminous databases

  • Association analysis is increasingly being deployed in numerous areas such as recommendation systems [2], study of market basket data [3], smart systems [4]–[7], IoT [8]–[10], fog and mobile edge computing [11], mining of data streams

  • Sparseness is the distinctive aspect of large real-world data generated by numerous sources, including pervasive computing, behavioral data, transactional data, and IoT applications, especially fog and mobile edge computing (MEC)

Read more

Summary

INTRODUCTION

In the realm of data science, association analysis has emerged as an unavoidable technique that explores strong relationships in voluminous databases. Efficient identification of frequent itemsets is still a vigorous research problem, even though numerous techniques have been proposed so far It explores collections of items placed jointly in a transactional database [1]. Sparseness is the distinctive aspect of large real-world data generated by numerous sources, including pervasive computing, behavioral data, transactional data, and IoT applications, especially fog and mobile edge computing (MEC). Techniques based on Iterative Transaction Lattice (ITL) are recently proposed to learn frequent itemsets from large real datasets that are sparse too [52], [53]. Based on the Iterative Trimmed Transaction Lattice (ITTL), the TRICE algorithm efficiently explores frequent itemsets from sparse real datasets [53]. A technique namely, Deferring the GENEration of Power sets for Mining Frequent Itemsets from Sparse Big data (D-GENE), is proposed in this paper.

RELATED WORK
D-GENE
Findings
VIII. CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.