Improved Genetic Algorithm for High-Utility Itemset Mining

Qiang Zhang,Wei Fang,Quan Wang,Jun Sun

doi:10.1109/access.2019.2958150

Abstract

High-utility itemset mining (HUIM) is an important research topic in the data mining field. Typically, traditional HUIM algorithms must handle the exponential problem of huge search space when the database size or number of distinct items is very large. As an alternative and effective approach, evolutionary computation (EC)-based algorithms have been proposed to solve HUIM problems because they can obtain a set of nearly optimal solutions in limited time. However, it is still time-consuming for EC-based algorithms to find complete high-utility itemsets (HUIs) in transactional databases. To address this problem, we propose an HUIM algorithm based on an improved genetic algorithm (HUIM-IGA). In addition, a neighborhood exploration strategy is proposed to improve search efficiency for HUIs. To reduce missing HUIs, a population diversity maintenance strategy is employed in the proposed HUIM-IGA. An individual repair method is also introduced to reduce invalid combinations for discovering HUIs. In addition, an elite strategy is employed to prevent the loss of HUIs. Experimental results obtained on a set of real-world datasets demonstrate that the proposed algorithm can find complete HUIs in terms of the given minimum utility threshold, and the time-consuming of HUIM-IGA is relatively lower when mining the same number of HUIs than state-of-the-art EC-based HUIM algorithms.

Highlights

Data mining refers to the process of extracting potentially valuable information or patterns from a large amount of data [1], [2]
The average runtime of by High-utility itemset mining (HUIM)-BPSO, HUIM-BPSOsig, and HUPEumuGRAM required to mine all high-utility itemsets (HUIs) on most datasets exceeded three hours because the standard search route based on the traditional genetic algorithms (GA) or particle swarm optimization (PSO) will gradually reduce the search ability with declining population diversity; some HUIs may be missed, which makes it difficult to find all HUIs in practical time
evolutionary computation (EC)-based HUIM algorithms have an advantage in mining HUIs compared to traditional HUIM algorithms, such as HUPEumu-GRAM, HUIM-BPSOsig, HUIMBPSO, and Bio-HUIF-GA

Summary

INTRODUCTION

Data mining refers to the process of extracting potentially valuable information or patterns from a large amount of data [1], [2]. 2) INDIVIDUAL REPAIR STRATEGY Due to the randomness of EC, EC-based algorithms typically generate combinations of items that do not exist in the dataset during the population initialization and genetic phases, which wastes execution time and increases the search space. The proposed pruning strategy performs sequential pruning based on the TWU values of 1-HTWUIs and attempts to save item combinations that may generate high utility in the individual. 3) NEIGHBORHOOD EXPLORATION STRATEGY FOR REPEATED HUIS Due to the randomness of EC, the EC-based HUIM algorithms will produce many meaningless combinations of items, and many duplicate HUIs. As a result, execution time is inevitably increased by these repeated HUIs. Input: HUI , binary coded representation of high-utility itemset; len, the length of individual coding. 9: // Repair individuals pop[i] ← Indiv_Repair(pop[i], len, Sorted_TWU _index); if pop[i] ∈ HUIs

20: Replace two randomly selected individuals of

EXPERIMENTAL RESULTS AND DISCUSSIONS

CONCLUSION