An investigation of the imputation techniques for missing values in ordinal data enhancing clustering and classification analysis validity

Shafiq Alam,Muhammad Sohaib Ayub,Sakshi Arora,Muhammad Asad Khan

doi:10.1016/j.dajour.2023.100341

Abstract

Missing data can significantly impact dataset integrity and suitability, leading to unreliable statistical results, distortions, and poor decisions. The presence of missing values in data introduces inaccuracies in clustering and classification and compromises the reliability and validity of such analyses. This study investigates multiple imputation techniques specifically designed for handling missing values in ordinal data commonly encountered in surveys and questionnaires. Quantitative approaches are used to evaluate different imputation methods on various datasets with varying missing value percentages. By comparing the performance of imputation techniques using clustering metrics and algorithms (e.g., k-means, Partitioning Around Medoids), the study provides valuable insights for selecting appropriate imputation methods for accurate data analysis. Furthermore, the study examines the impact of imputed values on classification algorithms, including k-Nearest Neighbors (kNN), Naive Bayes (NB), and Multilayer Perceptron (MLP). Results demonstrate that the decision tree method is the most effective approach, closely aligning with the original data and achieving high accuracy. In contrast, random number imputation performs poorly, indicating limited reliability. This study advances the understanding of handling missing values and emphasizes the need to address this issue to enhance data analysis integrity and validity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Decision Analytics Journal	Publication Date: Oct 12, 2023
Citations: 3	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

An investigation of the imputation techniques for missing values in ordinal data enhancing clustering and classification analysis validity

Abstract

Talk to us

Similar Papers

More From: Decision Analytics Journal

Lead the way for us

Similar Papers

ESTIMATION OF MISSING VALUES USING OPTIMISED HYBRID FUZZY C-MEANS AND MAJORITY VOTE FOR MICROARRAY DATA
Shamini Raja Kumaran ... Lizawati Mi Yusuf
Journal of Information and Communication Technology | VOL. 19
Shamini Raja Kumaran, et. al.Shamini Raja Kumaran ... Lizawati Mi Yusuf
01 Jan 2020
Journal of Information and Communication Technology | VOL. 19

Missing value estimation for DNA microarray gene expression data: local least squares imputation.
Hyunsoo Kim ... Gene H Golub
Bioinformatics | VOL. 21
Hyunsoo Kim, et. al.Hyunsoo Kim ... Gene H Golub
27 Aug 2004
Bioinformatics | VOL. 21

A Comparison of Strategies for Missing Values in Data on Machine Learning Classification Algorithms
Tebogo Makaba ... Eustace Dogo
-
Tebogo Makaba, et. al.Tebogo Makaba ... Eustace Dogo
01 Nov 2019
01 Nov 2019

Application of Deep Learning and Transfer Learning in Continuous Missing Value Imputation of Water Quality Data
Li Lyu ... Meng Fang
-
Li Lyu, et. al.Li Lyu ... Meng Fang
09 Dec 2022
09 Dec 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An investigation of the imputation techniques for missing values in ordinal data enhancing clustering and classification analysis validity

Abstract

Talk to us

Similar Papers

More From: Decision Analytics Journal