Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation

Qinbao Song,Martin Shepperd,Xiangru Chen,Jun Liu

doi:10.1016/j.jss.2008.05.008

Abstract

Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the predictive performance after using the k-NN missing data imputation technique to see if it is better to tolerate missing data or to try to impute missing values and then apply the C4.5 algorithm. For the investigation, we simulated three missingness mechanisms, three missing data patterns, and five missing data percentages. We found that the k-NN imputation can improve the prediction accuracy of C4.5. At the same time, both C4.5 and k-NN are little affected by the missingness mechanism, but that the missing data pattern and the missing data percentage have a strong negative impact upon prediction (or imputation) accuracy particularly if the missing data percentage exceeds 40%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation

Abstract

Talk to us

Similar Papers

More From: The Journal of Systems & Software

Lead the way for us

Journal: The Journal of Systems & Software	Publication Date: May 17, 2008
Citations: 122

Similar Papers

A new imputation method for small software project data sets
Qinbao Song ... Martin Shepperd
The Journal of Systems & Software | VOL. 80
Qinbao Song, et. al.Qinbao Song ... Martin Shepperd
16 Jun 2006
The Journal of Systems & Software | VOL. 80

Missing data techniques in analogy-based software development effort estimation
Ali Idri ... Alain Abran
The Journal of Systems & Software | VOL. 117
Ali Idri, et. al.Ali Idri ... Alain Abran
27 Apr 2016
The Journal of Systems & Software | VOL. 117

A Short Note on Safest Default Missingness Mechanism Assumptions
Qinbao Song ... Martin Shepperd
Empirical Software Engineering | VOL. 10
Qinbao Song, et. al.Qinbao Song ... Martin Shepperd
01 Apr 2005
Empirical Software Engineering | VOL. 10

Fuzzy case‐based‐reasoning‐based imputation for incomplete data in software engineering repositories
Ibtissam Abnane ... Ali Idri
Journal of Software: Evolution and Process | VOL. 32
Ibtissam Abnane, et. al.Ibtissam Abnane ... Ali Idri
16 Mar 2020
Journal of Software: Evolution and Process | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation

Abstract

Talk to us

Similar Papers

More From: The Journal of Systems & Software