Abstract
The common problem for data quality is missing data. The real datasets have lot of missing values. Missing values imputation is a challenging issue in machine learning and data mining. Missing data should be carefully handled; otherwise it affects the quality of the mining process or the performance of classification algorithms. Mean method of imputation is the most common method to replace the missing values. In this paper, we address the negative impact of missing value imputation and solution for improvement while evaluating the performance of kNN algorithm for classification of Diabetes data. We selected diabetes dataset because it contains lot of missing values and the impact of imputation is very obvious. To measure the performance, we used Accuracy and Error rate as the metrics.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.