Probabilistic neural network based categorical data imputation

Kancherla Jonah Nishanth,Vadlamani Ravi

doi:10.1016/j.neucom.2016.08.044

Abstract

Real world datasets contain both numerical and categorical attributes. Very often missing values are present in both numerical and categorical attributes. The missing data has to be imputed as the inferences made from complete data are often more accurate and reliable than those made from incomplete data [15]. Also, most of the data mining algorithms cannot work with incomplete datasets. The paper proposes a novel soft computing architecture for categorical data imputation. The proposed imputation technique employs Probabilistic Neural Network (PNN) preceded by mode for imputing the missing categorical data. The effectiveness of the proposed imputation technique is tested on 4 benchmark datasets under the 10 fold-cross validation framework. In all datasets, except Mushroom, which are complete, some values, which are randomly removed, are treated as missing values. The performance of the proposed imputation technique is compared with that of 3 statistical and 3 machine learning methods for data imputation. The comparison of the mode+PNN imputation technique with mode, K-Nearest Neighbor (K-NN), Hot Deck (HD), Naive Bayes, Random Forest (RF) and J48 (Decision Tree) imputation techniques demonstrates that the proposed method is efficient, especially when the percentage of missing values is high, for records having more than one missing value and for records having a large number of categories for each categorical variable.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Probabilistic neural network based categorical data imputation

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: Aug 27, 2016
Citations: 59

Similar Papers

A Unified Metric for Categorical and Numerical Attributes in Data Clustering
Yiu-Ming Cheung ... Hong Jia
-
Yiu-Ming Cheung, et. al.Yiu-Ming Cheung ... Hong Jia
01 Jan 2013
01 Jan 2013

Fuzzy case‐based‐reasoning‐based imputation for incomplete data in software engineering repositories
Ibtissam Abnane ... Ali Idri
Journal of Software: Evolution and Process | VOL. 32
Ibtissam Abnane, et. al.Ibtissam Abnane ... Ali Idri
16 Mar 2020
Journal of Software: Evolution and Process | VOL. 32

Classification of breast cancer recurrence based on imputed data: a simulation study
Rahibu A Abassi ... Amina S Msengwa
BioData Mining | VOL. 15
Rahibu A Abassi, et. al.Rahibu A Abassi ... Amina S Msengwa
07 Dec 2022
BioData Mining | VOL. 15

Empirical Performance Evaluation of Imputation Techniques using Medical Dataset
O A Alade ... A Selamat
IOP Conference Series: Materials Science and Engineering | VOL. 551
O A Alade, et. al.O A Alade ... A Selamat
01 Aug 2019
IOP Conference Series: Materials Science and Engineering | VOL. 551

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Probabilistic neural network based categorical data imputation

Abstract

Talk to us

Similar Papers

More From: Neurocomputing