Abstract

Real-world data are often corrupted by noise and outliers, which are originated from different procedures such as data collection, storage, and processing. Noise and outliers decrease the quality of clustering and lead to the inaccurate and misplaced cluster centers. In this paper, we propose a new algorithm called Improved Possibilistic Fuzzy C-Means (IPFCM) to cluster noisy data. First, initial cluster centers are calculated by Possibilistic Fuzzy C-Means (PFCM) which do not match dense regions of the data. Then, the domain is divided to some subdomains and each data point is assigned to a sub-domain. The cluster centers are iteratively moved towards high-density regions by maximizing a novel cluster validity index. In the proposed method, a Gaussian membership function is defined on each cluster to weight the data. Then, the sum of weights in each cluster is calculated. The product of these values is considered as the validity index. Since division of the domain is changed with moving the cluster centers, this procedure is repeated until the convergent criterion is satisfied. Cluster analysis performed on six synthetics, nine real benchmarks datasets shows the superiority of IPFCM over some previous clustering algorithms such as Fuzzy C-Means (FCM), PFCM, Kernel Fuzzy C-Means (KFCM), Noise Clustering (NC), and Generalized Entropy based Possibilistic Fuzzy C-Means (GEPFCM). The clustering results of near-fault ground motion data indicate that the cluster centers identified by IPFCM are well separated from each other, while those for PFCM are close to each other in some datasets. Moreover, the results show that the impact of noisy data on the proposed index and consequently cluster analysis decreases as the noisy data get away from the cluster centers which is one of the advantages of using IPFCM algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.