Abstract
Sliding window is a widely used model in the process of mining frequent patterns in data streams. In order to determine the exact size of the sliding window, it is necessary to know the time and scale of the change over data streams in advance. However, the prior knowledge is difficult to determine, and the fixed-size sliding window in the traditional algorithms obviously cannot adapt to this change, which have poor performance on the latest concepts of data streams. Based on the above shortcomings, this paper proposes a new frequent pattern mining algorithm based on concept drift using variable sliding window: VSWCDD (Variable Sliding Window-Concept Drift Detection), which is suitable for mining frequent patterns in changing data streams. The window size is dynamically determined by whether the concept drift occurs in the data stream. During the mining process, the mining result variables and the cause variables about the concept drift are monitored simultaneously. When the data stream is stable and there is no concept change, the window size keeps expansion. When the concept drift occurs, the window size changes according to the difference between detection nodes. Extensive experiments on both real and synthetic data show that the VSW-CDD algorithm proposed in this paper can timely detect the concept drift in the data stream and adapt to the new concepts by adjusting the window size. Finally, the latest frequent patterns in the data streams are mined and has a better effect for click data streams on e-commerce sites and medical data. Compared with other algorithms, the algorithm in this paper also has better performance in terms of recall and adaptation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.