Imputation Strategies for Clustering Mixed-Type Data with Missing Values

Rabea Aschenbruck,Gero Szepannek,Adalbert F X Wilhelm

doi:10.1007/s00357-022-09422-y

Rabea Aschenbruck, Gero Szepannek + Show 1 more

Open Access

https://doi.org/10.1007/s00357-022-09422-y

Copy DOI

Abstract

Incomplete data sets with different data types are difficult to handle, but regularly to be found in practical clustering tasks. Therefore in this paper, two procedures for clustering mixed-type data with missing values are derived and analyzed in a simulation study with respect to the factors of partition, prototypes, imputed values, and cluster assignment. Both approaches are based on the k-prototypes algorithm (an extension of k-means), which is one of the most common clustering methods for mixed-type data (i.e., numerical and categorical variables). For k-means clustering of incomplete data, the k-POD algorithm recently has been proposed, which imputes the missings with values of the associated cluster center. We derive an adaptation of the latter and additionally present a cluster aggregation strategy after multiple imputation. It turns out that even a simplified and time-saving variant of the presented method can compete with multiple imputation and subsequent pooling.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Classification	Publication Date: Nov 26, 2022
Citations: 6	License type: open-access

R Discovery Prime

R Discovery Prime

Imputation Strategies for Clustering Mixed-Type Data with Missing Values

Abstract

Talk to us

Similar Papers

More From: Journal of Classification

Lead the way for us

Similar Papers

Uso da imputação múltipla de dados faltantes: uma simulação utilizando dados epidemiológicos
Luciana Neves Nunes ... Jandyra Maria Guimarães Fachel
Cadernos de Saúde Pública | VOL. 25
Luciana Neves Nunes, et. al.Luciana Neves Nunes ... Jandyra Maria Guimarães Fachel
01 Feb 2009
Cadernos de Saúde Pública | VOL. 25

The application of nonparametric data augmentation and imputation using classification and regression trees within a large-scale panel study

-

01 Jan 2017
01 Jan 2017

What is missing from my missing data plan?
Sharon D Yeatts ... Renée H Martin
Stroke | VOL. 46
Sharon D Yeatts, et. al.Sharon D Yeatts ... Renée H Martin
07 May 2015
Stroke | VOL. 46

Comparison of Outcomes for Children With Cervical Spine Injury Based on Destination Hospital From Scene of Injury
Jennifer F Anders ... Kathleen Adelgais
Academic Emergency Medicine | VOL. 21
Jennifer F Anders, et. al.Jennifer F Anders ... Kathleen Adelgais
01 Jan 2014
Academic Emergency Medicine | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Imputation Strategies for Clustering Mixed-Type Data with Missing Values

Abstract

Talk to us

Similar Papers

More From: Journal of Classification