Abstract

Many data mining and data analysis techniques function with large datasets. These large data sets have missing values which result in biased estimates, imprecise statistical results or unacceptable conclusions. Data mining and data analysis techniques cannot be directly applied to datasets with missing values. For this purpose, different imputation techniques are proposed by different authors for both categorical and continuous variables. The existing imputation techniques have many limitations such as (a) methods like conditional mean imputation results in biased parameter estimation. (b) Too much variation is discovered in the inference of any single value or distance between particular samples in the case of random draw imputation. (c) In case of multiple imputations it is not easy to determine the posterior distribution of samples to draw from. In this paper, we present an unsupervised learning technique based on a Kohonen self-organizing map used for both categorical and numerical data values. In this paper, our aim is to achieve the highest accuracy. To achieve this, we trained our model by using the splitting approach to make the learning model and use this model to predict the accuracy. The proposed algorithm can map the missing values closed to original by adjusting the weights by improving accuracy when compared to classification without missing values and with missing values.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.