Abstract

Fuzzy C-Means (FCM) is widely accepted as a clustering technique. However, it cannot often manage different uncertainties associated with data. Interval Type-2 Fuzzy C-Means (IT2FCM) is an improvement over FCM since it can model and minimize the effect of uncertainty efficiently. However, IT2FCM for large data often gets trapped in local optima and fails to find optimal cluster centers. To overcome this challenge an Ant Colony-based Optimization (ACO) is proposed. Another challenge encountered is determining the number of clusters to perform clustering. Subtractive clustering (SC) is an efficient technique to estimate appropriate number of clusters. Though for large datasets the convergence rate of ACO and SC becomes high and thus, it becomes challenging to cluster data and evaluate correct number of clusters. To encounter the challenges of large dataset, Multi-Round Sampling (MRS) technique is proposed. IT2FCM-ACO with SC and MRS technique performs clustering on subsets of data and determines suitable cluster centers and cluster number. The obtained clusters are then extended to the entire dataset. This eliminates the need for IT2FCM to work on the complete dataset. Thus, the objective of this paper is to optimize IT2FCM using ACO algorithm and to estimate the optimal number of clusters using SC while employing MRS to handle the challenges of voluminous data. Results obtained from several clustering evaluation measures shows the improved performance of IT2FCM-ACO-MRS compared to ITFCM-ACO and IT2FCM. Speed up for different sample size of dataset is computed and is found that IT2FCM-ACO-MRS is ≈1–5 times faster than IT2FCM and IT2FCM-ACO for medium datasets whereas for large datasets it is reported to be ≈ 30–150 times faster.

Highlights

  • Clustering is the process of assigning a homogenous group of objects into subsets called clusters so that objects in each cluster are more similar to each other than objects from different clusters based on the values of their attributes [1]

  • For validation of the proposed method the results were compared with Fuzzy C-Means (FCM), adaptive FCM (AFCM) and Interval Type-2 Fuzzy CMeans (IT2FCM) and the results prove that genetic algorithm (GA) improves the performance of IT2FCM by determining appropriate cluster centroids

  • The results obtained from different cluster validity index measures for IT2FCM-Ant Colony-based Optimization (ACO) and IT2FCM-ACO-Multi-Round Sampling (MRS) are discussed and compared with IT2FCM-Alternating Optimization (AO)

Read more

Summary

Introduction

Clustering is the process of assigning a homogenous group of objects into subsets called clusters so that objects in each cluster are more similar to each other than objects from different clusters based on the values of their attributes [1]. The FCM [10, 11] is commonly used technique for fuzzy clustering analysis because of its capability to handle uncertainty. FCM assign data object partially to multiple clusters with certain degree of membership and handle overlapping partitions. The degree of membership in fuzzy clusters depends on the closeness of the data object to the cluster centres. FCM is good in data clustering and has been the base for developing other clustering algorithms but is very susceptible to noise and incapable of handling large number of uncertainties associated with data set. One fuzzifier cannot handle uncertainty for interval type-2 fuzzy sets; two fuzzifier m1 and m2 were defined that represents different fuzzy degrees.

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.