Abstract

Mining distributed data streams is a focus of much research in recent years, and it has brought many challenging problems. One of these problems is just learning and maintaining the global patterns from multiple data streams in distributed environments. In this paper, we discuss micro-cluster based classifying problems in distributed data streams, and propose the methods to mine data streams in the distributed environments oriented to both labeled and unlabeled data. For each local site, local micro-cluster based ensemble is used and its updating algorithms are designed. Making use of the time-based sliding window techniques, the local models in a fixed time-span are transferred to a central site after being generated in all local sites, and then the global patterns related to this time-span can be mined in the central site. In our methods, the global patterns are micro-cluster based rather than typical classifiers such decision trees, which can get expected classification accuracy when higher mining performance is assured. The experiment shows these methods are effective and efficient to classify multiple data streams in distributed environments.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call