Effects of Data Imputation Methods on Data Missingness in Data Mining

Marvin L Brown,Chien-Hua Mike Lin

doi:10.5176/2010-2283_1.2.53

Abstract

The purpose of this paper is to study the effectiveness of data imputation methods in dealing with data missingness in the data mining phase of knowledge discovery in Database (KDD). The application of data mining techniques without careful consideration of missing data can result into biased results and skewed conclusions. This research explores the impact of data missingness at various levels in KDD models employing neural networks as the primary data mining algorithm. Four of the most commonly utilized data imputation methods Case Deletion, Mean Substitution, Regression Imputation, and Multiple Imputation were evalutated using Root Mean Square (RMS) Values, ANOVA Testing, T-tests, and Tukey’s Honestly Significant Difference Test to assess the differences of performance levels between various Knowledge Discovery and Neural Network Models, both in the presence and absence of Missing Data. KeywordsKDD; Data mining; Data Imputation; Missing Data; Neural Networks Introduction (Heading 1)

Full Text