This paper focuses on the issue of low efficiency in the FP-growth algorithm for frequent pattern mining and proposes an improved algorithm, ICFM-growth. Experimental results demonstrate that the improved algorithm outperforms the FP-growth algorithm in terms of both runtime and space utilization. By studying the classical frequent pattern algorithms Apriori and FP-growth, the latter is an improved algo- rithm based on Apriori to address the problems of generating a large number of candidate itemsets and consuming significant memory space. While FP-growth exhibits superior mining efficiency compared to Apriori, it faces challenges when dealing with large and long transactional databases due to the construction of numerous FP-trees, which increases computational tasks and prolongs runtime, leading to lagging mining efficiency. To address this issue, the improved algorithm ICFM-growth is proposed. It constructs a co-occurrence frequency matrix to perform preliminary screening on the transaction set, focusing on high-frequency and co-occurring item pairs, thereby reducing unnecessary computa- tions. In the initial stage, as important item pairs have already been filtered, the algorithm can directly operate on these item pairs and items, rather than the entire dataset, thereby reducing the search space and computational complexity. Additionally, the structure of the FP-tree ena- blesthe algorithm to store and process data more efficiently, avoiding frequent scanning of the entire database, as seen in traditional Apriori algorithms. Finally, through simulation experiments on publicly available datasets such as Movie Data and House Data, validated using cross- validation, the ICFM-growth algorithm proves to be significantly superior to FP-growth in terms of time and space efficiency. It demonstrates faster runtime, lower memory consumption, and superior mining efficiency compared to FP-growth.
Read full abstract