Abstract
A generalized form of Possibilistic Fuzzy C-Means (PFCM) algorithm (GPFCM) is presented for clustering noisy data. A function of distance is used instead of the distance itself to damp noise contributions. It is shown that when the data are highly noisy, GPFCM finds accurate cluster centers but FCM (Fuzzy C-Means), PCM (Possibilistic C-Means), and PFCM algorithms fail. FCM, PCM, and PFCM yield inaccurate cluster centers when clusters are not of the same size or covariance norm is used, whereas GPFCM performs well for both of the cases even when the data are noisy. It is shown that generalized forms of FCM and PCM (GFCM and GPCM) are also more accurate than FCM and PCM. A measure is defined to evaluate performance of the clustering algorithms. It shows that average error of GPFCM and its simplified forms are about 80% smaller than those of FCM, PCM, and PFCM. However, GPFCM demands higher computational costs due to nonlinear updating equations. Three cluster validity indices are introduced to determine number of clusters in clean and noisy datasets. One of them considers compactness of the clusters; the other considers separation of the clusters, and the third one considers both separation and compactness. Performance of these indices is confirmed to be satisfactory using various examples of noisy datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.