Abstract

A possibilistic fuzzy c-means (PFCM) algorithm is a reliable algorithm proposed to deal with the weaknesses associated with handling noise sensitivity and coincidence clusters in fuzzy c-means (FCM) and possibilistic c-means (PCM). However, the PFCM algorithm is only applicable to complete data sets. Therefore, this research modified the PFCM for clustering incomplete data sets to OCSPFCM and NPSPFCM with the performance evaluated based on three aspects, 1) accuracy percentage, 2) the number of iterations, and 3) centroid errors. The results showed that the NPSPFCM outperforms the OCSPFCM with missing values ranging from 5% − 30% for all experimental data sets. Furthermore, both algorithms provide average accuracies between 97.75%−78.98% and 98.86%−92.49%, respectively.

Highlights

  • Incomplete data sets are commonly found in the real world due to failures during the collection, merging, cleaning, and transfer of data from one source to another [1]

  • This study analysed the potential and performance modification of the Possibilistic Fuzzy C-Means (PFCM) algorithm for clustering incomplete data sets. These modifications, which emerge from the PFCM, as Optimal Completion Strategy Possibilistic Fuzzy C-Means (OCSPFCM) and Nearest Prototype Strategy Possibilistic Fuzzy C-Means (NPSPFCM) are associated with incomplete data sets clustering

  • The cluster results obtained at this stage are the base for evaluating the performance of the OSCPFCM and NPSPFCM algorithms

Read more

Summary

Introduction

Incomplete data sets are commonly found in the real world due to failures during the collection, merging, cleaning, and transfer of data from one source to another [1]. The main problem faced when trying to cluster incomplete data sets is the inability of the existing algorithm to carry out the process. This is because popular clustering algorithms comprises fuzzy c-means (FCM) [2] and possibilistic c-means (PCM) [3], which are used for complete data sets. Bezdek and Hathaway [4] developed the FCM algorithm to deal with the problem of clustering data sets with missing values They proposed whole data strategy fuzzy c-means (WDSFCM) to deal with the problems associated with the incomplete data set clustering by removing features that contain missing values and running standard FCM algorithms, thereby making the remaining data complete. The WDSFCM produces biased clustering results when the missing values are large

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call