Abstract

Over the past decade, high-utility itemset mining (HUIM) has received widespread attention that can emphasize more critical information than was previously possible using frequent itemset mining (FIM). Unfortunately, HUIM is very similar to FIM since the methodology determines itemsets using a binary model based on a pre-defined minimum utility threshold. Additionally, most previous works only focused on single, small datasets in HUIM, which is not realistic to any real-world scenarios today containing big data environments. In this work, the fuzzy-set theory and a MapReduce framework are both utilized to design a novel high fuzzy utility pattern mining algorithm to resolve the above issues. Fuzzy-set theory is first involved and a new algorithm called efficient high fuzzy utility itemset mining (EFUPM) is designed to discover high fuzzy utility patterns from a single machine. Two upper-bounds are then estimated to allow early pruning of unpromising candidates in the search space. To handle the large-scale of big datasets, a Hadoop-based high fuzzy utility pattern mining (HFUPM) algorithm is then developed to discover high fuzzy utility patterns based on the Hadoop framework. Experimental results clearly show that the proposed algorithms perform strongly to mine the required high fuzzy utility patterns whether in a single machine or a large-scale environment compared to the current state-of-the-art approaches.

Highlights

  • With the rapid growth of information techniques, knowledge discovery in databases (KDD) is a critical research field to reveal important, valuable, interesting, and essential knowledge from unprocessed data sources [1,2,3,4,5,6,7]

  • Many different kinds of expertise or rules have been proposed to state the essential concepts from a dataset, and association rule mining (ARM) and/or frequent-itemset mining (FIM) work to gain fundamental knowledge in KDD which has been applied to many domains and applications [4,8,9,10,11]

  • The significant contributions of this paper are summarized as follows: 1. We proposed an efficient high fuzzy utility pattern mining algorithm (EFUPM) algorithm to efficiently discover the high fuzzy utility patterns from the database in a single-machine mechanism

Read more

Summary

Introduction

With the rapid growth of information techniques, knowledge discovery in databases (KDD) is a critical research field to reveal important, valuable, interesting, and essential knowledge from unprocessed data sources [1,2,3,4,5,6,7]. High-utility itemset mining (HUIM) was proposed to create solutions for revealing more valuable information from a database. Yao et al proposed HUIMs to reveal HUIs effectively [13], purchase quantities and unit profits of items are both considered to show high-utility itemsets in a transaction database. Fuzzy-set theory was involved in HUIM by presenting an alternative knowledge for pattern mining It solves the limitations of the above works especially the quantity is involved as the representative of the linguistic terms that shows interpretable knowledge of the discovered patterns. 2. To handle the large-scale databases, the Hadoop-based high fuzzy utility pattern mining (HFUPM) algorithm is introduced here with several MapReduce tasks to scan the original dataset and reduce the size of temporary files as possible. To better reduce the search space for discovering the high fuzzy utility patterns, two upper-bounds are designed here to early remove the unpromising candidates that can be applied to both single-machine or Hadoop-based framework for mining the high fuzzy utility patterns

Related work
Preliminaries and Problem Statement
Proposed EFUPM and HFUPM
Itemset Fuzzy Utility Upper Bound
Searching graph for mining process
Pruning strategies
Developed EFUPM algorithm
Developed HFUPM algorithm
MapReduce 1
MapReduce 2
MapReduce 3
Experimental evaluation
Data information and preparation
Algorithm development and evaluation
17 MB 190 MB
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call