Abstract
Frequent pattern mining is playing an increasingly important role in a growing number of real-time data flow scenarios, such as large-scale order stream data, network traffic monitoring, web accessing record stream, and so on. The continuous, unbounded and high speed characteristics of massive data stream are a huge challenge for the current frequent pattern mining approach. The main challenge is that, as data stream continuously arriving, the non frequent patterns discarded can possibly become frequent again. In this paper, aimed at the characteristics of real-time data stream, we propose a compact data structure, called CPS-tree to maintain and operate the full information of data stream. Compared to current related works, our algorithm can dynamically support large-scale data stream with one-pass scan which can be easily applied to other data stream processing environments, Moreover, the load imbalance in the current frequent pattern mining is a pretty common problem. We analysis the features of data stream, and propose a depth-based strategy to solve the imbalance problem in our parallel algorithm. In conclusion, we propose the BPFPMS algorithm, a balanced parallel frequent pattern mining over massive data stream, to dynamically and efficiently mine frequent patterns over large scale data stream. Our experiments show that our algorithm can achieve a good speedup and a good degree of balance among each node with different degree of parallelism.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.