Mining skyline frequent-utility patterns from big data environment based on MapReduce framework

Jimmy Ming-Tai Wu,Jerry Chun-Wei Lin,Mu-En Wu,Ranran Li

doi:10.3233/ida-220756

Abstract

When the concentration focuses on data mining, frequent itemset mining (FIM) and high-utility itemset mining (HUIM) are commonly addressed and researched. Many related algorithms are proposed to reveal the general relationship between utility, frequency, and items in transaction databases. Although these algorithms can mine FIMs or HUIMs quickly, these algorithms merely take into account frequency or utility as a unilateral criterion for itemsets but the other factors (e.g., distance, price) could be also valuable for decision-making. A new skyline framework has been presented to mine frequent high utility patterns (SFUPs) to better support user decision-making. Several new algorithms have been proposed one after another. However, the Internet of Things (IoT), mobile Internet, and traditional Internet are generating massive amounts of data every day, and these cutting-edge standalone algorithms can not satisfy the new challenge of finding interesting patterns from this data. Big Data uses a distributed architecture in the form of cloud computing to filter and process this data to extract useful information. This paper proposes a novel parallel algorithm on Hadoop as a three-stage iterative algorithm based on MapReduce. MapReduce is used to divide the mining tasks of the whole large data set into multiple independent sub-tasks to find frequent and high utility patterns in parallel. Numerous experiments were done in this paper, and from the results, the algorithm can handle large datasets and show good performance on Hadoop clusters.

Full Text