Abstract

Abstract This paper presents a robust, dynamic, and unsupervised fuzzy learning algorithm (RDUFL) that aims to cluster a set of data samples with the ability to detect outliers and assign the numbers of clusters automatically. It consists of three main stages. The first (1) stage is a pre-processing method in which possible outliers are determined and quarantined using a concept of proximity degree. The second (2) stage is a learning method, which consists in auto-detecting the number of classes with their prototypes for a dynamic threshold. This threshold is automatically determined based on the similarity among the detected prototypes that are updated at the exploration of a new data. The last (3) stage treats quarantined samples detected from the first stage to determine whether they belong to some class defined in the second phase. The effectiveness of this method is assessed on eight real medical benchmark datasets in comparison to known unsupervised learning methods, namely, the fuzzy c-means (FCM), possibilistic c-means (PCM), and noise clustering (NC). The obtained accuracy of our scheme is very promising for unsupervised learning problems.

Highlights

  • Clustering is one of the most relevant data-mining tasks [42]

  • The last (3) stage treats quarantined samples detected from the first stage to determine whether they belong to some class defined in the second phase. The effectiveness of this method is assessed on eight real medical benchmark datasets in comparison to known unsupervised learning methods, namely, the fuzzy c-means (FCM), possibilistic c-means (PCM), and noise clustering (NC)

  • To assess the performance of our approach, some experiments were conducted on an artificial dataset X1, and on eight real-world databases that are available in UCI [8]: Lymphography, Diabetes, Indian, Haberman’s Survival, BCW, Post-operative Patient, Parkinsons, and EEG Eyes State

Read more

Summary

Introduction

Clustering is one of the most relevant data-mining tasks [42]. It is the process of organizing objects into a set of classes. We propose a robust approach, which allows clustering data by auto-detecting the classes they form and providing the existing outliers without giving any parameter. The proposed approach consists of three stages: – A pre-processing stage using similarity to detect objects likely to be outliers and which will be considered as possible outliers These objects are quarantined and excluded from the second stage. – A second stage in which classes are determined based on a dynamic threshold This threshold is based on the minimum similarity among the detected prototypes, which are updated at the exploration of any new object. – A final stage, which is a processing of possible outliers in order to determine whether they belong to one of these classes detected in the second phase.

Related Work
The PCM Algorithm
The Robust-FCM Algorithm
Learning Phase
Treatment of Possible Outliers
Results and Discussion
Artificial Dataset
Real-World Dataset
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call