Support vector regression‐based imputation in analogy‐based software development effort estimation

Ali Idri,Ibtissam Abnane,Alain Abran

doi:10.1002/smr.2114

Abstract

AbstractMissing data (MD) is a widespread problem that can affect the ability to use data to construct effective software development effort estimation (SDEE) techniques. To deal with this challenge, several imputation techniques have been investigated in SDEE and k‐nearest neighbors (KNN)‐based imputation is still the most frequently used. To the best of our knowledge, no study has used support vector regression (SVR)‐based imputation to construct accurate estimation techniques, in particular those based on analogy. This paper introduces a new imputation technique based on SVR for handling MD in two analogy‐based SDEE techniques: classical analogy and fuzzy analogy. More specifically, we investigate whether the use of SVR instead of KNN in imputing MD improves the predictive performance of these two analogy‐based techniques. A total of 1134 experiments were conducted involving seven datasets, SVR/KNN MD imputation techniques (KNN with Euclidean and Manhattan distances), three missingness mechanisms (missing completely at random, missing at random, non‐ignorable missing), and MD percentages from 10% to 90%. The results suggest that the use of SVR imputation, rather than KNN imputation, may improve the prediction performance of both analogy‐based techniques. Furthermore, we found that the impact of MD percentage upon effort prediction performance is reduced when using SVR rather than KNN. Moreover, fuzzy analogy generates better estimates in terms of the standardized accuracy measure than classical analogy regardless of the MD technique, the dataset used, the missingness mechanism, or the MD percentage.

Full Text