Abstract

Many applications have been producing streaming data nowadays, which motivates techniques to extract knowledge from such sources. In this sense, the development of data stream clustering algorithms has gained an increasing interest. However, the application of these algorithms in real systems remains a challenge, since data streams often come from non-stationary environments, which can affect the choice of a proper set of model parameters for fitting the data or finding a correct number of clusters. This work proposes an evolving clustering algorithm based on a mixture of typicalities. It is based on the TEDA framework and divide the clustering problem into two subproblems: micro-clusters and macro-clusters. Experimental results with benchmarking data sets showed that the proposed methodology can provide good results for clustering data and estimating its density even in the presence of events that can affect data distribution parameters, such as concept drifts. In addition, the model parameters were robust in relation to the state-of-the-art algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.