Abstract

This paper explores five pattern mining problems and proposes a new distributed framework called DT-DPM: Decomposition Transaction for Distributed Pattern Mining. DT-DPM addresses the limitations of the existing pattern mining problems by reducing the enumeration search space. Thus, it derives the relevant patterns by studying the different correlation among the transactions. It first decomposes the set of transactions into several clusters of different sizes, and then explores heterogeneous architectures, including MapReduce, single CPU, and multi CPU, based on the densities of each subset of transactions. To evaluate the DT-DPM framework, extensive experiments were carried out by solving five pattern mining problems (FIM: Frequent Itemset Mining, WIM: Weighted Itemset Mining, UIM: Uncertain Itemset Mining, HUIM: High Utility Itemset Mining, and SPM: Sequential Pattern Mining). Experimental results reveal that by using DT-DPM, the scalability of the pattern mining algorithms was improved on large databases. Results also reveal that DT-DPM outperforms the baseline parallel pattern mining algorithms on big databases.

Highlights

  • Pattern mining is a data mining task that aims at studying the correlations within data and discovering relevant patterns from large databases

  • These results are obtained thanks to many factors: i) the decomposition method applied in the DT-DPM framework by minimizing the number of separator items, ii) solving sub-problems with small number of transactions and small number of items, instead of dealing the whole transactional database with the whole distinct items, and iii) the ability of the pattern mining algorithms to be integrated with the DTDPM framework

  • For each cluster of transactions, the pattern mining algorithm is launched in order to discover the relevant patterns

Read more

Summary

Introduction

Pattern mining is a data mining task that aims at studying the correlations within data and discovering relevant patterns from large databases. The problem of pattern mining is to find an efficient approach to extract the relevant patterns in a database. It is used in many applications and domains such as ontology matching [1], process mining [2], decision making [3], and constraint programming [4]. The pattern mining is called with “Big data” applications such as in frequent genes extractions from DNA in Bio-informatics [5], relevant hashtags from twitter streams in social network analysis [6], analysis of sensorial data from IoT devices in smart city applications [7].

Motivation
Contributions
Outline
DBSCAN
Mining process
Dataset description
Decomposition performance
Speedup of DT-DPM
DT-DPM Vs state-of-the-art algorithms
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.