Software Estimation in the Design Stage with Statistical Models and Machine Learning: An Empirical Study

Ángel J. Sánchez-García,María Saarayim González-Hernández,Karen Cortés-Verdín,Juan Carlos Pérez-Arriaga

doi:10.3390/math12071058

Abstract

Accurate estimation of software effort and time in the software development process is a key activity to achieve the necessary product quality. However, underestimation or overestimation of effort has become a key challenge for software development. One of the main problems is the estimation with metrics from late stages, because the product must already be finished to make estimates. In this paper, the use of statistical models and machine learning approaches for software estimation are used in early stages such as software design, and a data set is presented with metric values of design artifacts with 37 software projects. As results, models for the estimation of development time and effort are proposed and validated through leave-one-out cross-validation. Further, machine learning techniques were employed in order to compare software projects estimations. Through the statistical tests, it was proven that the errors were not statistically different with the regression models for effort estimation. However, with Random Forest the best statistical results were obtained for estimating development time.

Full Text