Abstract

Many data mining and data analysis techniques function with large datasets. These large data sets have missing values which result in biased estimates, imprecise statistical results or unacceptable conclusions. Data mining and data analysis techniques cannot be directly applied to datasets with missing values. For this purpose, different imputation techniques are proposed by different authors for both categorical and continuous variables. The existing imputation techniques have many limitations such as (a) methods like conditional mean imputation results in biased parameter estimation. (b) Too much variation is discovered in the inference of any single value or distance between particular samples in the case of random draw imputation. (c) In case of multiple imputations it is not easy to determine the posterior distribution of samples to draw from. In this paper, we present an unsupervised learning technique based on a Kohonen self-organizing map used for both categorical and numerical data values. In this paper, our aim is to achieve the highest accuracy. To achieve this, we trained our model by using the splitting approach to make the learning model and use this model to predict the accuracy. The proposed algorithm can map the missing values closed to original by adjusting the weights by improving accuracy when compared to classification without missing values and with missing values.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call