Abstract

Compared to frequent sequence mining that is a computationally challenging task with many intermediate subsequences, frequent closed and generator sequence mining provides several benefits because it results in increased efficiency and concise representations while preserving all the information of all traditional patterns recovered from the representations. Besides, frequent closed sequences can be combined with generators to generate non-redundant sequential rules and to recover all sequential patterns as well as their frequencies quickly. However, most algorithms that have been proposed to discover either closed sequences or generators at a time and for large databases containing many long sequences are still too long to complete the work or run out of memory. Therefore, this paper, by exploiting the advantage of multi-core processor architectures, proposes a novel parallel algorithm called Par-GenCloSM for simultaneously mining both frequent closed and generator sequences in the same process. Par-GenCloSM is based on efficient techniques to quickly eliminate unpromising candidate branches and two novel strategies named EPUCloGen and GPPCloGen to reduce the global synchronization cost of the parallel model and speed up the mining process. Par-GenCloSM is the first parallel algorithm for mining frequent closed sequences and generators concurrently. Experimental results on many real-life and synthetic databases show that Par-GenCloSM outperforms state-of-the-art algorithms in terms of runtime and memory consumption, especially for long sequence databases with low minimum support thresholds.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call