Feature Engineering for Enhanced Model Performance in Software Effort Estimation

Sreekumar P Pillai*,Dr S.D Madhukumar,Dr T Radharamanan

doi:10.35940/ijrte.c5602.098319

Sreekumar P Pillai*, Dr S.D Madhukumar + Show 1 more

Open Access

https://doi.org/10.35940/ijrte.c5602.098319

Copy DOI

Abstract

Many new methodologies have been defined in the last two decades in the domain of Software Effort Estimation. They include manual methods based on expert judgment, analogy-based models, parametric models, regression models, machine learning models, and more recently, deep learning models. Except for manual methods, all other models depend heavily on data. Lack of quality data in this domain is a motivation to explore means to optimize the sparse data available. Machine learning algorithms depend on domain features, and their ability to represent and model the domain, to solve the problems irrespective of whether it is classification or regression, image, or voice synthesis. There is continued research for the best representation of the issue through the right feature space. While most of the traditional research rely on the original dataset and concentrate more on feature selection, modern-day approaches explore creating additional features that have the potential to extend the models representational space.This research builds on our last research exploring the potential to improve Software Effort Estimation accuracy by employing engineered features in addition to the original ones. The features are created manually based on the literature. Through the engineered features, we captured additional representational features such as missingness and proportion of categorical data available in the dataset. We present the rationale for the features generated and compare the prediction accuracy between a model using the original dataset and the engineered data set.Our experiments in Feature Engineering is innovative in the Software Estimation domain and the results conclusive establishing its use in predicting Software Effort. We report an improved accuracy of 38% with engineered features at PRED(15), and 11% improvement at PRED(20). The quantitative growth that we have been able to achieve in terms of accuracy is promising enough for this to be adopted as a standard in future research on the subject and practical applications.

Full Text