Abstract

In recent years, HUIM (or a.k.a. high‐utility itemset mining) can be seen as investigated in an extensive manner and studied in many applications especially in basket‐market analysis and its relevant applications. Since current basket‐market scenario also involves IoT equipment to collect information, i.e., sensor or smart devices, it is necessary to consider the mining of HUIs (or a.k.a. high‐utility itemsets) in a large‐scale database especially with IoT situations. First, a GA‐based MapReduce model is presented in this work known as GMR‐Miner for mining closed patterns with high utilization in large‐scale databases. The k‐means model is initially adopted to group transactions regarding their relevant correlation based on the frequency factor. A genetic algorithm (GA) is utilized in the developed MapReduce framework that can be used to explore the potential and possible candidates in a limited time. Also, the developed 3‐tier MapReduce model can be easily deployed in Spark for the handlings of any database of large scale for knowledge discovery of closed patterns with high utilization. We created sets of extensive experimental environments for evaluating the results of the developed GMR‐Miner compared to the well‐known and state‐of‐the‐art CLS‐Miner. We present our in‐depth results to show that the developed GMR‐Miner outperforms CLS‐Miner in many criteria, i.e., memory usage, scalability, and runtime.

Highlights

  • As there is rapid growth of information technologies regarding machine learning models, Internet of Things (IoT) [1], and edge and cloud computing [2, 3], data-driven mining has become an important topic that can be used to extract the meaningful information from the collections of those techniques

  • Several pattern mining models [4,5,6,7,8,9] have been extensively studied, and the most fundamental knowledge of pattern mining in knowledge discovery in databases (KDD) is called ARM or association rule mining, which is deployed through varied applications and specific domains

  • Main findings are as follows: (i) We design a 3-tier MapReduce framework deployed in Spark for mining closed high-utility itemsets (CHUIs) in large-scale datasets (ii) A k-means model is made use of for grouping relevant transactions into clusters; ensuring discovered CHUI numbers is complete and correct (iii) A genetic algorithm (GA)-based model makes utilization of the MapReduce framework to explore the possible and potential candidates in a limited time for greatly reducing the computational cost (iv) Experimental evaluation shows that GAbased MapReduce- (GMR-)Miner has a strong and outstanding performance

Read more

Summary

Introduction

As there is rapid growth of information technologies regarding machine learning models, Internet of Things (IoT) [1], and edge and cloud computing [2, 3], data-driven mining has become an important topic that can be used to extract the meaningful information from the collections of those techniques. The generic algorithm of HUIM [16] does not take DC property for revealing the set of HUIs, which requires a huge size of the search space To solve this limitation, TWU (or a.k.a. transaction-weighted utilization) model [14] considers the transaction utility to construct the HTWUIs (or a.k.a. high transaction-weighted utilization itemsets) as the itemsets with the upper-bound values for maintaining the DC property, which is named as TWDC (or a.k.a. transactionweighted downward closure) in HUIM. (i) We design a 3-tier MapReduce framework deployed in Spark for mining CHUIs in large-scale datasets (ii) A k-means model is made use of for grouping relevant transactions into clusters; ensuring discovered CHUI numbers is complete and correct (iii) A GA-based model makes utilization of the MapReduce framework to explore the possible and potential candidates in a limited time for greatly reducing the computational cost (iv) Experimental evaluation shows that GMR-Miner has a strong and outstanding performance

Related Work
Preliminary and Problem Statement
The Developed GA-Based MapReduce Model for CHUIM
Experimental Evaluation
Findings
Conclusion and Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.