Fuzzy C-mean Missing Data Imputation for Analogy-based Effort Estimation

Ayman Jalal Almutlaq,Adila Firdaus Binti Arbain,Dayang N A Jawawi

doi:10.14569/ijacsa.2021.0120874

Ayman Jalal Almutlaq, Adila Firdaus Binti Arbain + Show 1 more

Open Access

https://doi.org/10.14569/ijacsa.2021.0120874

Copy DOI

Abstract

The accuracy of effort estimation in one of the major factors in the success or failure of software projects. Analogy-Based Estimation (ABE) is a widely accepted estimation model since its flow human nature in selecting analogies similar in nature to the target project. The accuracy of prediction in ABE model in strongly associated with the quality of the dataset since it depends on previous completed projects for estimation. Missing Data (MD) is one of major challenges in software engineering datasets. Several missing data imputation techniques have been investigated by researchers in ABE model. Identification of the most similar donor values from the completed software projects dataset for imputation is a challenging issue in existing missing data techniques adopted for ABE model. In this study, Fuzzy C-Mean Imputation (FCMI), Mean Imputation (MI) and K-Nearest Neighbor Imputation (KNNI) are investigated to impute missing values in Desharnais dataset under different missing data percentages (Desh-Miss1, Desh-Miss2) for ABE model. FCMI-ABE technique is proposed in this study. Evaluation comparison among MI, KNNI, and (ABE-FCMI) is conducted for ABE model to identify the suitable MD imputation method. The results suggest that the use of (ABE-FCMI), rather than MI and KNNI, imputes more reliable values to incomplete software projects in the missing datasets. It was also found that the proposed imputation method significantly improves software development effort prediction of ABE model.

Highlights

Software development effort is considered one of the most significant metrics estimated in software projects due to the reasons that planning, developing, managing and all other important aspects of project depend extremely on accurate estimation of development effort[1]
As a result (ABEFCMI) accomplished significant improvement compared to K-Nearest Neighbor Imputation (KNNI) and Mean Imputation (MI) on the selected accuracy evaluation measures (MMRE, PRED, and SA) for Analogy‐ Based Estimation (ABE) estimation model applied for Desh-Miss1 incomplete dataset
ABE as wide accepted effort estimation model depend mainly on the completed historically dataset for effort prediction, confronting missing values in previously completed projects will improve the accuracy of ABE prediction

Summary

Introduction

Software development effort is considered one of the most significant metrics estimated in software projects due to the reasons that planning, developing, managing and all other important aspects of project depend extremely on accurate estimation of development effort[1]. Among many ML models Analogy‐ Based Estimation (ABE) is a widely accepted estimation model since its flow human nature in selecting analogies similar in nature to the target project[4]. Missing data (MD) in software engineering datasets is major problem that affects the performance of effort prediction models [5, 6]. Missing data imputation is the most investigated technique in software effort estimation and KNN imputation was the popular adopted method [8]. Analogy based estimation proposed by Shepherd and Schofield as one of the most prominent non-algorithmic effort estimation model [13] .Comparison dependent process of comparing similar projects to the target project is done in order to derive the development effort in ASEE. ABE consist of four parts: Historical completed software engineering projects dataset

Methods

Results

Conclusion