Abstract

Missing data are a serious issue that influences the prediction accuracy of software development effort estimation (SDEE) techniques and especially analogy-based software effort estimation (ASEE). Hence, appropriate handling of missing data is necessary in order to ensure best performance. To deal with this issue K-nearest neighbors (KNN) imputation has been widely used. However, none of the studies investigating KNN imputation in SDEE have addressed the impact of parameter settings on the imputation process given that parameter optimization techniques are often used at the prediction level, as they highly impact the performance of SDEE techniques including ASEE. This paper proposes and evaluates an ensemble KNN imputation technique for ASEE. Thereafter, we compare ASEE performance using ensemble KNN imputation with those using either a grid search based single KNN imputation or KNN imputation without parameter optimization. For the six datasets used for comparison, the ensemble KNN imputation significantly improved ASEE performance compared with KNN imputation without optimization. Moreover, ensemble KNN imputation and grid search-based imputation behaved similarly. Given that grid search is time consuming, the ensemble KNN imputation may be an alternative to deal with missing data in the ASEE process.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call