Abstract

Frequent pattern mining is playing an increasingly important role in a growing number of real-time data flow scenarios, such as large-scale order stream data, network traffic monitoring, web accessing record stream, and so on. The continuous, unbounded and high speed characteristics of massive data stream are a huge challenge for the current frequent pattern mining approach. The main challenge is that, as data stream continuously arriving, the non frequent patterns discarded can possibly become frequent again. In this paper, aimed at the characteristics of real-time data stream, we propose a compact data structure, called CPS-tree to maintain and operate the full information of data stream. Compared to current related works, our algorithm can dynamically support large-scale data stream with one-pass scan which can be easily applied to other data stream processing environments, Moreover, the load imbalance in the current frequent pattern mining is a pretty common problem. We analysis the features of data stream, and propose a depth-based strategy to solve the imbalance problem in our parallel algorithm. In conclusion, we propose the BPFPMS algorithm, a balanced parallel frequent pattern mining over massive data stream, to dynamically and efficiently mine frequent patterns over large scale data stream. Our experiments show that our algorithm can achieve a good speedup and a good degree of balance among each node with different degree of parallelism.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call