Abstract

AbstractMining frequent itemsets over transaction data streams is critical for many applications, such as wireless sensor networks, analysis of retail market data, and stock market predication. The sliding window method is an important way of mining frequent itemsets over data streams. The speed of the sliding window is affected not only by the efficiency of the mining algorithm, but also by the efficiency of updating data. In this paper, we propose a new data structure with a Tail Pointer Table and a corresponding mining algorithm; we also propose a algorithm COFI2, a revised version of the frequent itemsets mining algorithm COFI (Co-Occurrence Frequent-Item), to reduce the temporal and memory requirements. Further, theoretical analysis and experiments are carried out to prove their effectiveness.

Highlights

  • Since Agrawal 1 developed the first algorithm Apriori for mining frequent itemsets from static sales dataset in 1994, new algorithms are proposed constantly for various sub-domains of frequent itemsets mining, such as those for traditional frequent itemsets 2, 3, 4, 5, 6 in certain datasets, high utility itemsets 7, 8, 9, 10, 11, frequent itemsets in uncertain datasets 12, 13, 14

  • We propose a new data structure, called TPT-tree (Tail Pointer Table tree), to store the stream data of a window, it can improve the efficiency of updating data and costs less memory than DST/DSP; and propose a corresponding algorithm, called COFI2, for mining frequent itemsets over data streams

  • Concluding the above experiments, we can see that our proposed algorithm TPT has achieved a better performance than DST under varied minimum support thresholds and varied batch-sizes, and its advantage is stable along with the accumulation of the data flow process

Read more

Summary

Introduction

Since Agrawal 1 developed the first algorithm Apriori for mining frequent itemsets from static sales dataset in 1994, new algorithms are proposed constantly for various sub-domains of frequent itemsets mining, such as those for traditional frequent itemsets 2, 3, 4, 5, 6 in certain datasets, high utility itemsets 7, 8, 9, 10, 11, frequent itemsets in uncertain datasets 12, 13, 14 These approaches could be classified into two categories: level-wise approaches and pattern-Growth approaches. We propose a new data structure, called TPT-tree (Tail Pointer Table tree), to store the stream data of a window, it can improve the efficiency of updating data and costs less memory than DST/DSP; and propose a corresponding algorithm, called COFI2, for mining frequent itemsets over data streams. The organization of this article is as follows: Section 2 discusses related work; Section 3 provides a description of the problem and defines relevant terms; Section 4 introduces a structure TPT-tree and a corresponding algorithm; Section 5 shows the experimental results, and Section 6 gives conclusions

Related work
Data structures
Algorithms of mining frequent itemsets
Description of the problem
Structure of TPT-tree
Modify tail support numbers of TPT
Experimental analyses
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.