Abstract

This article presents a stream mining framework to cluster the data stream and monitor its evolution. Even though concept drift is expected to be present in data streams, explicit drift detection is rarely done in stream clustering algorithms. The proposed framework is capable of explicit concept drift detection and cluster evolution analysis. Concept drift is caused by the changes in data distribution over time. Relationship between concept drift and the occurrence of physical events has been studied by applying the framework on the weather data stream. Experiments led to the conclusion that the concept drift accompanied by a change in the number of clusters indicates a significant weather event. This kind of online monitoring and its results can be utilized in weather forecasting systems in various ways. Weather data streams produced by automatic weather stations (AWS) are used to conduct this study.

Highlights

  • With the advancement in hardware and software technology, the number of applications producing large volume data streams is ever increasing

  • Surveys conducted on data stream clustering (Ghesmoune et al, 2016; Silva et al, 2013) point to the fact that explicit concept drift detection and adaptation are rarely done in data stream clustering algorithms

  • An explicit concept drift detection methodology based on Page-Hinkley Test (PHT) is adopted here, which monitors the stream continuously to find out the probable concept changes

Read more

Summary

INTRODUCTION

With the advancement in hardware and software technology, the number of applications producing large volume data streams is ever increasing. The stream, the underlying data distributions change over time These changes might affect the inter relationship between input and output variables (Gama, 2014) leading to ‘Concept Drift’ (Widmer and Kubat, 1996; Gama, 2014). This paper proposes a framework for the online clustering of data streams It performs concept drift detection and cluster evolution monitoring to generate a warning on the dynamic changes taking place in the environment of the stream. The proposed framework has components to cluster the stream online, detect concept changes and track the evolution of clusters. As the source of the data is highly dynamic, the clustering structure might exhibit a corresponding change and fixing the value of k limits the ability to capture such changes in the clustering structure The utility of this framework is studied by applying it on weather data.

BACKGROUND
EXPERIMENTS AND RESULTS
DISCUSSION AND FUTURE
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call