Abstract
Sequential pattern mining (SPM) is a widely used data mining technique for discovering common sequences of events in large databases. When compared with the simple set mining problem and string mining problem, the hierarchical structure of sequential pattern mining (due to the need to consider frequent subsets within each itemset, as well as order among itemsets) and the resulting large permutation space makes SPM extremely expensive on conventional processor architectures. We propose a hardware-accelerated solution of the SPM using Micron's Automata Processor (AP), a hardware implementation of non-deterministic finite automata (NFAs). The Generalized Sequential Pattern (GSP) algorithm for SPM searching exposes massive parallelism, and is therefore well-suited for AP acceleration. We implement the multi-pass pruning strategy of the GSP via the AP's fast reconfigurability. A generalized automaton structure is proposed by flattening sequential patterns to simple strings to reduce compilation time and to minimize overhead of reconfiguration. Up to 90X and 29X speedups are achieved by the AP-accelerated GSP on six real-world datasets, when compared with the optimized multicore CPU and GPU GSP implementations, respectively. The proposed CPU-AP solution also outperforms the state-of-the-art PrefixSpan and SPADE algorithms on multicore CPU by up to 452X and 49X speedups. The AP advantage grows further with larger datasets.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have