A genetic algorithm based framework for software effort prediction

Juan Murillo-Morera,Carlos Castro-Herrera,Christian Quesada-López,Marcelo Jenkins

doi:10.1186/s40411-017-0037-x

Juan Murillo-Morera, Carlos Castro-Herrera + Show 2 more

Open Access

https://doi.org/10.1186/s40411-017-0037-x

Copy DOI

Abstract

BackgroundSeveral prediction models have been proposed in the literature using different techniques obtaining different results in different contexts. The need for accurate effort predictions for projects is one of the most critical and complex issues in the software industry. The automated selection and the combination of techniques in alternative ways could improve the overall accuracy of the prediction models.ObjectivesIn this study, we validate an automated genetic framework, and then conduct a sensitivity analysis across different genetic configurations. Following is the comparison of the framework with a baseline random guessing and an exhaustive framework. Lastly, we investigate the performance results of the best learning schemes.MethodsIn total, six hundred learning schemes that include the combination of eight data preprocessors, five attribute selectors and fifteen modeling techniques represent our search space. The genetic framework, through the elitism technique, selects the best learning schemes automatically. The best learning scheme in this context means the combination of data preprocessing + attribute selection + learning algorithm with the highest coefficient correlation possible. The selected learning schemes are applied to eight datasets extracted from the ISBSG R12 Dataset.ResultsThe genetic framework performs as good as an exhaustive framework. The analysis of the standardized accuracy (SA) measure revealed that all best learning schemes selected by the genetic framework outperforms the baseline random guessing by 45–80%. The sensitivity analysis confirms the stability between different genetic configurations.ConclusionsThe genetic framework is stable, performs better than a random guessing approach, and is as good as an exhaustive framework. Our results confirm previous ones in the field, simple regression techniques with transformations could perform as well as nonlinear techniques, and ensembles of learning machines techniques such as SMO, M5P or M5R could optimize effort predictions.

Highlights

Several prediction models have been proposed in the literature using different techniques obtaining different results in different contexts
The genetic framework performs as good as an exhaustive framework
The analysis of the standardized accuracy (SA) measure revealed that all best learning schemes selected by the genetic framework outperforms the baseline random guessing by 45–80%

Summary

Introduction

Several prediction models have been proposed in the literature using different techniques obtaining different results in different contexts. The need for accurate effort predictions for projects is one of the most critical and complex issues in the software industry. Software effort prediction models have been studied for many years, but empirical evaluation has not led to simple nor consistent ways to interpret their results (Shepperd and MacDonell 2012). Several prediction models have been evaluated in the literature and inconsistent findings have been reported regarding which technique is the best (Jorgensen and Shepperd 2007; Shepperd 2007; Dejaeger et al 2012; Shepperd and MacDonell 2012). The results of these studies are not univocal and are often highly technique-and dataset-dependent. The issue of which modeling technique to use for software effort estimation remains an open research question (Dejaeger et al 2012)

Objectives

Methods

Results

Conclusion