Abstract

We propose an incremental fuzzy clustering algorithm for hybrid data discovery. The algorithm is based on the ASM model where data items are represented by agents placed in a two dimensional grid. The agents will group themselves into clusters by making simple moves in their environment. They will try to get closer to each other if they are rather similar or to get away from each other if they are rather different. The algorithm allocates a new agent on the grid whenever a new data item arrives. At each step the new agent contacts an agent from the grid and if they are similar then they will group together in the same cluster. Whenever a new cluster is created the agents will try to merge the cluster with one of the previously created clusters. If a newly created agent does not find a similar fellow then it will start an ASM-like process in order to search for one and thus the data is clustered. Several clustering algorithms exist each with its own strengths and weaknesses. Some algorithms need an ini- tial estimation of the number of clusters (k-means, fuzzy c-means); others could often be too slow (agglomerative hi- erarchical clustering algorithms). Ant-based clustering al- gorithms often require hybridization with a classical clus- tering algorithm such as k-means. In (2) an ant-based clustering algorithm is presented. It is based on the ASM (Ants Sleeping Model) approach. In ASM, an ant has two states on a two-dimensional grid: ac- tive state and sleeping state. When the artificial ant's fitness is low, it has a higher probability to wake up and stay in active state. It will thus leave its original position to search for a more secure and comfortable position to sleep. When an ant locates a comfortable and secure position, it has a higher probability to sleep unless the surrounding environ- ment becomes less hospitable and activates it again. In (3) a Stigmergic Agent System (SAS) combining the strengths of Ant Colony Systems and Multi-Agent Systems concepts is proposed. The agents from the SAS are using both direct and indirect communication. By using direct communication the risk of getting trapped in local optima is lower. However, as showed in (16), most ant-based al- gorithms can be used only in a first phase of the clustering process because of the high number of clusters that are usu- ally produced. In a second phase a k-means-like algorithm is often used. In (16), an algorithm in which the behaviour of the arti- ficial ants is governed by fuzzy IF-THEN rules is presented. Like all ant-based clustering algorithms, no initial partition- ing of the data is needed, nor should the number of clus- ters be known in advance. The ants are capable to make their own decisions about picking up items. Hence the two phases of the classical ant-based clustering algorithm are merged into one, and k-means becomes superfluous.

Highlights

  • Several clustering algorithms exist each with its own strengths and weaknesses

  • Incremental clustering is used to process sequential, continuous data flows or data streams and in situations in which cluster shapes change over time

  • They are well fitted in real-time systems, wireless sensor networks or data streams because in such systems it is difficult to store the datasets in memory

Read more

Summary

INTRODUCTION

Several clustering algorithms exist each with its own strengths and weaknesses. Some algorithms need an initial estimation of the number of clusters (k-means, fuzzy c-means); others could often be too slow (agglomerative hierarchical clustering algorithms). In [2] an ant-based clustering algorithm is presented It is based on the ASM (Ants Sleeping Model) approach. The agents are able to detect changes in the environment and adjust their moves The advantage of this approach is that it enables the ants to communicate directly like in [3] breaking the neighbourhood boundaries and decreasing the chance of ants to get trapped in local minima. In order to solve the clustering problem we propose an incremental algorithm based on ASM (Ants Sleeping Model) [2, 6]. Incremental clustering is used to process sequential, continuous data flows or data streams and in situations in which cluster shapes change over time They are well fitted in real-time systems, wireless sensor networks or data streams because in such systems it is difficult to store the datasets in memory. The advantages and drawbacks of the approach together with some concluding remarks are presented in the closing Section 7

MOTIVATION
RELATED WORK
THEORETICAL BACKGROUND
INCREMENTAL FUZZY CLUSTERING
Formal aspects
Our approach
EXPERIMENTS
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call