Abstract

Data clustering has always been an important aspect of data mining. Extracting clusters from data could be very difficult especially when the features are large and the classes not clearly partitioned, hence the need for high-quality clustering techniques. The major shortcoming of various clustering techniques is that the number of clusters must be stated before the clustering starts. A recent successful work in clustering is the Clustering analysis based on Glowworm Swarm Optimization (CGSO) algorithm. CGSO uses the multimodal search capacity of the Glowworm Swarm Optimization (GSO) algorithm to automatically figure out clusters within a data set without prior knowledge about the number of clusters. However, the sensor range — one of the parameters of the CGSO algorithm and a determinant of the number of clusters as well as the cluster quality — is in fact obtained by trial and error, which is clearly an inefficient approach. Consequently, this paper proposes the Modified Clustering analysis based on Glowworm Swarm Optimization (CGSOm) algorithm. The CGSOm extends the CGSO by incorporating a mechanism that determines the sensor range efficiently and automatically, modifying the glowworm initialization method and introducing a function that measures the cluster error during the iteration phase. The proposed algorithm was tested on artificial and real-world data sets. Experimental results show that for most data sets, the proposed CGSOm algorithm gives better clustering quality results of entropy and purity values when compared with the original CGSO algorithm and four standard clustering algorithms commonly used in the literature. The results reveal that the CGSOm yields better quality clusters.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.