A Survey of Key Technologies for High Utility Patterns Mining

Chunyan Zhang,Meng Han,Shiyu Du,Mingyao Shen,Rui Sun

doi:10.1109/access.2020.2981962

Chunyan Zhang, Meng Han + Show 3 more

Open Access

https://doi.org/10.1109/access.2020.2981962

Copy DOI

Abstract

Recently, high utility pattern mining (HUPM) is one of the most important research issues in data mining. Because it can consider the non-binary frequency values of items in a transaction and the different profit values of each item. It has been widely used. First of all, this paper briefly describes the related concepts, formulas and examples of application for HUPM. Secondly, the key technologies for HUMP are introduced in detail, and they are divided into main methods including Apriori-based, tree-based, projection-based, list-based, data format-based, and index-based and so on. The paper further compares data sets, uses, advantages and disadvantages of algorithms, laid the foundation for the next research direction. Then, this article outlines the high utility derivative patterns, including high average utility pattern, high utility sequential pattern, and high utility compact pattern and so on. Because static data is difficult to meet the actual needs, this paper summarizes the efficient use of HUPMs’ methods over data streams, mainly based on incremental methods, based on the sliding window model methods, based on the time decay model methods and based on the landmark model methods and so on.

Highlights

Frequent itemsets mining (FIM) is one of the core tasks in data mining
Some novel algorithms that consider both high utility itemsets (HUIs) for a specific period of time can be used to mine patterns that cannot be found by traditional High utility mining (HUIM), thereby reducing runtime and memory consumption, such as local high utility itemsets (LHUI)-Miner [39]; and expanding the occupancy rate to evaluate the transaction database, to a certain extent, providing a new research perspective for utility mining, such as HUOPM [55]
HAUI-Miner [72] is based on an efficient average-utility (AU) list structure that preserves only the information needed in the mining process, thereby compressing very large databases into compressed structures for more efficiently discovering high average-utility itemsets (HAUIs)

Summary

INTRODUCTION

Frequent itemsets mining (FIM) is one of the core tasks in data mining. FIM mines the itemsets that often appear together in the transaction database, and assumes that all items have the same importance (unit profit, price, etc.). The algorithm uses a two-phase periodic utility upper limit pattern to avoid the information loss during mining This algorithm can discover itemsets that customers regularly purchase and generate high profits. It considers the relative order of transactions, so it tends to find patterns that are stable in terms of utility throughout the database. VOLUME 8, 2020 is used to mine HUIs with multiple minutils This algorithm introduces the concept of minimal suffix utility and proposes a generalized pruning strategy for mining of HUIs efficiently. HUI-list-INS [15] is an incremental algorithm for inserting transactions in a dynamic environment, which reduces the amount of computation without generating a candidate list It uses an enumeration tree and 2-itemsets to speed up the calculation.

BASIC CONCEPTS

HIGH UTILITY DERIVATIVE PATTERNS

METHODS

METHODS BASED ON TIME DECAY MODELS

Findings

NEXT DIRECTION