Abstract
AbstractExtracting frequent itemsets from datasets is an important problem in data mining, for which several mining methods including FP‐Growth have been proposed. FP‐Growth is a classical frequent itemset mining method, which generates pattern databases without candidates. Many improvements have been made in the literature due to the high time complexity and memory usage of FP‐Growth. However, most of them still suffer from performance issues on large datasets. In this paper, we design an auxiliary structure, Array Prefix‐Tree (AP‐Tree), and propose a new algorithm, Array Prefix‐Tree Growth (APT‐Growth), which is further parallelized as a Spark workflow, referred to as PAPT‐Growth. Based on a density threshold, we incorporate an adaptive algorithm selection process into PAPT‐Growth to ensure its running time performance. We conduct extensive experiments on different thresholds and multiple datasets, and experimental results show the performance superiority of PAPT‐Growth in comparison with several state‐of‐the‐art methods such as PFP, YAFIM, and DFPS. The analysis on density reveals a changing point, which justifies the necessity and validity of adaptive algorithm selection.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have