RFIMiner: A regression-based algorithm for recently frequent patterns in multiple time granularity data streams

Lifeng Jia,Zhe Wang,Nan Lu,Xiujuan Xu,Dongbin Zhou,Yan Wang

doi:10.1016/j.amc.2006.06.115

Abstract

In this paper, we propose an algorithm for computing and maintaining recently frequent patterns which is more stable and smaller than the data stream and dynamically updating them with the incoming transactions. Our study mainly has two contributions. First, a regression-based data stream model is proposed to differentiate new and old transactions. The novel model reflects transactions into many multiple time granularities and can automatically adjust transactional fading rate by defining a fading factor. The factor defines a desired life-time of the information of transactions in the data stream. Second, we develop RFIMiner, a single-scan algorithm for mining recently frequent patterns from data streams. Our algorithm employs a special property among suffix-trees, so it is unnecessary to traverse suffix-trees when patterns are discovered. To cater to suffix-trees, we also adopt a new method called Depth-first and Bottom–up Inside Itemset Growth to find more recently frequent patterns from known frequent ones. Moreover, it avoids generating redundant computation and candidate patterns as well. We conduct detailed experiments to evaluate the performance of algorithm in several aspects. Results confirm that the new method has an excellent scalability and the performance meets the condition which requires better quality and efficiency of mining recently frequent itemsets in the data stream.

Full Text