Abstract

Knowledge Discovery in Dataset (KDD) plays a vital role in information analysis and retrieval based applications. Quality of data is the most indispensable component of KDD. The factor which affects the quality of datasets is presence of missing values. The data collected from the real world often contains serious data quality troubles such as incomplete, redundant, inconsistent, and/or noisy data. Handling missing values should be cautiously considered, or else prejudice might be introduced into the knowledge induced. The current work investigates three different treatments for dealing with missing values in United States Congressional Voting Records Database. All the machine learning methods were employed in one of the leading opensource data mining applications. This proposed study centers on the performance Evaluation of several classification models induced from data after applying three different methods to treat missing values. Results show that by boosting the k-nearest neighbor for imputation bids significant enhancement over traditional techniques (case/pairwise deletion and Replace missing value using mean ).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call