Abstract
AbstractMining frequent itemsets over transaction data streams is critical for many applications, such as wireless sensor networks, analysis of retail market data, and stock market predication. The sliding window method is an important way of mining frequent itemsets over data streams. The speed of the sliding window is affected not only by the efficiency of the mining algorithm, but also by the efficiency of updating data. In this paper, we propose a new data structure with a Tail Pointer Table and a corresponding mining algorithm; we also propose a algorithm COFI2, a revised version of the frequent itemsets mining algorithm COFI (Co-Occurrence Frequent-Item), to reduce the temporal and memory requirements. Further, theoretical analysis and experiments are carried out to prove their effectiveness.
Highlights
Since Agrawal 1 developed the first algorithm Apriori for mining frequent itemsets from static sales dataset in 1994, new algorithms are proposed constantly for various sub-domains of frequent itemsets mining, such as those for traditional frequent itemsets 2, 3, 4, 5, 6 in certain datasets, high utility itemsets 7, 8, 9, 10, 11, frequent itemsets in uncertain datasets 12, 13, 14
We propose a new data structure, called TPT-tree (Tail Pointer Table tree), to store the stream data of a window, it can improve the efficiency of updating data and costs less memory than DST/DSP; and propose a corresponding algorithm, called COFI2, for mining frequent itemsets over data streams
Concluding the above experiments, we can see that our proposed algorithm TPT has achieved a better performance than DST under varied minimum support thresholds and varied batch-sizes, and its advantage is stable along with the accumulation of the data flow process
Summary
Since Agrawal 1 developed the first algorithm Apriori for mining frequent itemsets from static sales dataset in 1994, new algorithms are proposed constantly for various sub-domains of frequent itemsets mining, such as those for traditional frequent itemsets 2, 3, 4, 5, 6 in certain datasets, high utility itemsets 7, 8, 9, 10, 11, frequent itemsets in uncertain datasets 12, 13, 14 These approaches could be classified into two categories: level-wise approaches and pattern-Growth approaches. We propose a new data structure, called TPT-tree (Tail Pointer Table tree), to store the stream data of a window, it can improve the efficiency of updating data and costs less memory than DST/DSP; and propose a corresponding algorithm, called COFI2, for mining frequent itemsets over data streams. The organization of this article is as follows: Section 2 discusses related work; Section 3 provides a description of the problem and defines relevant terms; Section 4 introduces a structure TPT-tree and a corresponding algorithm; Section 5 shows the experimental results, and Section 6 gives conclusions
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Computational Intelligence Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.