Adaptive kernel fuzzy clustering for missing data.

Anny K G Rodrigues,Raydonal Ospina,Marcelo R P Ferreira

doi:10.1371/journal.pone.0259266

Anny K G Rodrigues, Raydonal Ospina + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0259266

Copy DOI

Abstract

Many machine learning procedures, including clustering analysis are often affected by missing values. This work aims to propose and evaluate a Kernel Fuzzy C-means clustering algorithm considering the kernelization of the metric with local adaptive distances (VKFCM-K-LP) under three types of strategies to deal with missing data. The first strategy, called Whole Data Strategy (WDS), performs clustering only on the complete part of the dataset, i.e. it discards all instances with missing data. The second approach uses the Partial Distance Strategy (PDS), in which partial distances are computed among all available resources and then re-scaled by the reciprocal of the proportion of observed values. The third technique, called Optimal Completion Strategy (OCS), computes missing values iteratively as auxiliary variables in the optimization of a suitable objective function. The clustering results were evaluated according to different metrics. The best performance of the clustering algorithm was achieved under the PDS and OCS strategies. Under the OCS approach, new datasets were derive and the missing values were estimated dynamically in the optimization process. The results of clustering under the OCS strategy also presented a superior performance when compared to the resulting clusters obtained by applying the VKFCM-K-LP algorithm on a version where missing values are previously imputed by the mean or the median of the observed values.

Highlights

The incessant increase in volume and variety of data requires advances in methodologies in order to understand, process and summarize data automatically
Datasets with 5%, 10%, 15% and 20% of missing values were artificially generated using the methodology described in Section 7.1, which means that random variable M was sampled from Bernoulli distributions with parameter θ taken from {0.05, 0.10, 0.15, 0.20}
The problem of missing data is commonly discussed in several areas of science, as statistical techniques used for data analysis, such as clustering, were originally proposed for datasets without missing values

Summary

Introduction

The incessant increase in volume and variety of data requires advances in methodologies in order to understand, process and summarize data automatically. Cluster analysis is one of the main unsupervised techniques that are used to extract knowledge from data, due to its ability to aid in the process of understanding and visualizing data structures [1, 2]. The main goal in clustering is to organize the data (observations, data items, images, pixels etc.) based on similarity (or dissimilarity) criteria such that observations belonging to the same group show high degrees of similarity, while observations in different groups show high degrees of dissimilarity [3, 4].

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Nov 12, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Adaptive kernel fuzzy clustering for missing data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Water Quality Monitoring Missing Data Filling Method Based on Improved OCS-FCM
Huan Xu ... Jian Jin
-
Huan Xu, et. al.Huan Xu ... Jian Jin
01 Nov 2019
01 Nov 2019

A novel fuzzy clustering algorithm with human-computer cooperation for incomplete data
Li Zhang ... Lu Wang
-
Li Zhang, et. al.Li Zhang ... Lu Wang
01 May 2015
01 May 2015

Improved Optimization Through Procedures as Pseudo Objective Functions in Nonlinear Optimization of Oil Recovery With Next-Generation Reservoir Simulators
Rahul Ranjith ... Aditya Tiwari
-
Rahul Ranjith, et. al.Rahul Ranjith ... Aditya Tiwari
26 Sep 2016
26 Sep 2016

Average Overlap for Clustering Incomplete Data Using Symmetric Non-negative Matrix Factorization
Sneha Chaudhari ... M Narasimha Murty
-
Sneha Chaudhari, et. al.Sneha Chaudhari ... M Narasimha Murty
01 Aug 2014
01 Aug 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive kernel fuzzy clustering for missing data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one