Abstract

Clustering is an unsupervised machine learning method that is used both individually and as a part of the preprocessing stage for the supervised machine learning methods. Due to its unsupervised nature, clustering results have less accuracy compared to the supervised learning. This article aims to introduce a new perspective in clustering by defining an approach for data pruning. The method also enables clustering using multiple sets of prototypes instead of only one set to improve clustering accuracy. Consequently, this approach has the potential to be used independently or as a part of a preprocessing to prepare purified data for the training step of a supervised learning approach. An evolving fuzzy clustering approach (EFCA) utilizes the fuzzy membership concept to breakdown clustering in epochs instead of running the clustering on all data at once. In some cases, for supervised learning, we rather have a smaller subset of highly accurate labeled data instead of a dataset with less accurate labels. The EFCA's “epoch cut” enables postpruning ability to eliminate obscure data points, which results in more clustering accuracy. The EFCA has been applied to a set of eight multivariate and ten time-series datasets, and for example, after deploying epoch cut and eliminating obscure data (20% of data) by automatic postpruning, it achieved 100% accuracy for the rest 80% Iris data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call