Abstract

There is always a lack of a cluster validity function and optimization strategy to find out clusters and catch the evolution trend of cluster structures on a categorical data stream. Therefore, this paper presents an optimization model for clustering categorical data streams. In the model, a cluster validity function is proposed as the objective function to evaluate the effectiveness of the clustering model while each new input data subset is flowing. It simultaneously considers the certainty of the clustering model and the continuity with the last clustering model in the clustering process. An iterative optimization algorithm is proposed to solve an optimal solution of the objective function with some constraints. Furthermore, we strictly derive a detection index for drifting concepts from the optimization model. We propose a detection method that integrates the detection index and the optimization model to catch the evolution trend of cluster structures on a categorical data stream. The new method can effectively avoid ignoring the effect of the clustering validity on the detection result. Finally, using the experimental studies on several real data sets, we illustrate the effectiveness of the proposed algorithm in clustering categorical data streams, compared with existing data-streams clustering algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.