Abstract

ContextPublicly available software cost estimation datasets are outdated and may not represent current industrial environments. Thus most research has concentrated on the development and evaluation of estimation models with limited evidence of their applicability to industrial practice. Moreover, these datasets and models may not be applicable in (under-represented) technically and economically constrained environments such as the software development environment in Sudan. ObjectiveThis paper aims to develop a machine learning model that is suitable for the Sudanese software industry. To demonstrate the suitability of our approach, we evaluate our model using the publicly available SEERA (Software enginEERing in SudAn) dataset, which is a software cost estimation dataset from organizations in Sudan. MethodWe demonstrated the suitability of the SEERA dataset for effort estimation by comparing the attributes that had a high correlation with actual effort and actual duration to the cost factors identified by (Sudanese) experts. In addition, we developed an early-stage Random Forest model to estimate project effort and duration from the SEERA dataset. Early-stage estimation is in-line with current Sudanese industrial practice. We investigated the impact of oversampling, feature selection, heterogeneity and local environmental factors on model accuracy. ResultsOur experimental results showed that the Random Forest model with oversampling and feature selection provided accurate estimates that were better than random guessing (standardized accuracy > 70 %). Our results were similar to accuracies reported in the literature. In addition, we demonstrated that our random forest model provided estimations that were more accurate than (Sudanese) expert judgement. ConclusionThis study has demonstrated the feasibility of our random forest model for early-stage effort and duration estimation for Sudanese software projects. The results demonstrate the importance of representative models and datasets for non-traditional technical environments. Further research is required to investigate the impact of local environmental factors on software cost estimation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call