Abstract

Addressing missing values is a persistent challenge in the field of data mining. The presence of incomplete data can significantly compromise the overall data quality. Consequently, it is crucial to handle incomplete data efficiently. This paper presents a novel approach for imputing missing values that incorporates Kernelized Fuzzy C-Means (KFCM) clustering and proposes a method termed LIKFCM, which combines its benefits with Linear Interpolation (LI). The proposed LIKFCM’s performance is assessed through a comparison against nine state-of-the-art imputation techniques (mean, median, LI, EMI, KNNI, KMI, FKMI, LIFCM, and LIPFCM) across ten widely used real-world datasets from the UCI repository with six combinations of missing ratios to assess the efficacy of the proposed imputation method. From the experimental results, it is evident that our proposed method outperforms the existing imputation methods with significant improvements in terms of RMSE & MAE for these datasets. Additionally, experiments examining the effect of missing values validate the robustness of the proposed approach by handling different missing ratios. The performance validation of the proposed approach against other state-of-the-art imputation methods has been conducted utilizing a Kendall’s W statistical test, involving a comparison of their mean ranks across different missing ratios. The outcomes indicate that LIKFCM has outperformed other imputation methods, attaining the highest rank in terms of different evaluation criteria.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call