Abstract

The need of fuzzy clustering arises in many real-world applications such as clumping the users based on their web browsing behavior where the behavior of a user can be similar to two different sets of users at the same instance. The aptness of fuzzy clustering for data streams is further intensified given their concept evolving nature. Data streams can be clustered either by following clustering-by-variable approach or clustering-by-example approach. Most of the existing fuzzy clustering-by-variable methods are applicable to numeric data streams only. In this article, a fuzzy hierarchical clustering method is proposed for clustering multiple nominal data streams using clustering-by-variable approach. The fuzzy affinity of data streams to different clusters is calculated using normalized cosine similarity to the cluster centroids. It handles the concept evolution by updating the hierarchical clustering structure by either merging and/or splitting the nodes depending on the extent to which the node entropy changes. The performance of the proposed method is analyzed and compared to hierarchical clustering for multiple nominal data streams (HCND), semifuzzy online divisive-agglomerative clustering, and nTreeClus on synthetic as well as real-world web-browsing dataset where it has outperformed all three in terms of cluster quality as quantified by Dunn index, modified Hubert <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\Gamma$</tex-math></inline-formula> statistic, and adjusted rand index. Furthermore, the experimental results show that the proposed method is highly promising with regard to capturing fuzzy clusters as indicated by Xie-Beni index, partition coefficient, and partition entropy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call