A Distributed Method for Fast Mining Frequent Patterns From Big Data

Peng-Yu Huang,Wen-Yu Chung,Wan-Shu Cheng,Ju-Chin Chen,Kawuu W Lin,Young-Lin Chen

doi:10.1109/access.2021.3115514

Peng-Yu Huang, Wen-Yu Chung + Show 4 more

Open Access

https://doi.org/10.1109/access.2021.3115514

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 8	License type: CC BY 4.0

Affiliation: National University of Kaohsiung

Abstract

In recent years, knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. For numerous real-life applications, frequent pattern mining and association rule mining have been extensively studied. In traditional mining algorithms, data are centralized and memory-resident. As a result of the large amount of data, bandwidth limitation, and energy limitations when applying these methods to distributed databases, especially in this era of big data, the performance is not effective enough. Hence, data mining on distributed environments has emerged as an important research area. To improve the performance, we propose a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and a brief data structure to store items and counts to minimize the data for transmission on the network. To ensure completeness and execution capability, DistEclat and BigFIM were considered for the experiment comparison. Experiments show that the proposed method has superior cost-effectiveness for processing massive datasets and good capabilities under various experiment conditions. The proposed method on average required only 33% of the execution time and 45% of the transmission cost of DistEclat. Compared to BigFIM, The proposed method on average required 23.3% of the execution time and 14.2% of the transmission cost of BigFIM.

Highlights

Knowledge discovery in databases provides a powerful capability to discover meaningful and useful information
The primary contributions of this study are (1) a set of algorithms based on frequent pattern (FP) growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and (2) a brief data structure to store items and counts to minimize the data for transmission on the network
To compare the performance evaluated by the proposed method and DistEclat and BigFIM, the real data that were generated from the frequent itemset mining dataset (FIMD) Repository was utilized for the experiments

Summary

INTRODUCTION

Knowledge discovery in databases provides a powerful capability to discover meaningful and useful information. Parallel and distributed computing techniques have attracted attention because of their ability to manage and compute large amounts of data These studies all have the same characteristics: high amount of data transmission time, high memory cost, high scanning cost expended by the database to discover FPs, and redundant execution time cost by unadaptable nodes. To improve the execution time and the redundant execution cost, we propose a distributed and parallel computing method called DFP (distributed frequent pattern mining). The primary contributions of this study are (1) a set of algorithms based on FP growth that discover FPs that are capable of providing fast and scalable service in distributed computing environments and (2) a brief data structure to store items and counts to minimize the data for transmission on the network.

ASSOCIATION RULE MINING

DISTRIBUTED ALGORITHMS FOR DISCOVERY OF FREQUENT PATTERNS

PROBLEM DEFINITION

Count c: 1 Total count 7 Total count 20 d b d: 1 b: 1 Number of TID

DFP mining algorithm

EXPERIMENTAL EVALUATION AND PERFORMANCE STUDY

EXPERIMENTAL SETUP

EXPERIMENTAL RESULTS

Summary

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Distributed Method for Fast Mining Frequent Patterns From Big Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Distributed Approach for Efficiently Extracting Common Patterns from Large Datasets
Dr P Senthil
International Journal for Research in Applied Science and Engineering Technology | VOL. 12
Dr P SenthilDr P Senthil
31 Mar 2024
International Journal for Research in Applied Science and Engineering Technology | VOL. 12

Frequent pattern generation algorithms for Association Rule Mining : Strength and challenges
Hemant Kumar Soni ... Manisha Jain
-
Hemant Kumar Soni, et. al.Hemant Kumar Soni ... Manisha Jain
01 Mar 2016
01 Mar 2016

Mining frequent patterns and association rules using similarities
Ansel Y Rodríguez-González ... José Ruiz-Shulcloper
Expert Systems With Applications | VOL. 40
Ansel Y Rodríguez-González, et. al.Ansel Y Rodríguez-González ... José Ruiz-Shulcloper
27 Jun 2013
Expert Systems With Applications | VOL. 40

Privacy-Preserving Frequent Pattern Mining from Big Uncertain Data
Carson K Leung ... Alfredo Cuzzocrea
-
Carson K Leung, et. al.Carson K Leung ... Alfredo Cuzzocrea
01 Dec 2018
01 Dec 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Distributed Method for Fast Mining Frequent Patterns From Big Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access