Abstract

The software engineering researchers have worked on different dimensions to facilitate better software effort estimates, including those focusing on dataset quality improvement. In this research, we specially investigated the effectiveness of outlier removal to improve estimation performance of 5 machine learning (ML) methods (Support Vector Regression, Random Forest, Ridge Regression, K-Nearest Neighbor, and Gradient Boosting Machines) for software development effort estimation (SDEE). We propose a novel discretization method based on Golden Section (dubbed as Golden Section based Adaptive Discretization, GSAD) to identify optimal number of outliers for SDEE dataset. The results signify the importance of optimal number of outliers’ removal to improve estimations. Moreover, the results obtained after applying GSAD technique have been compared with IQR and Cooks’ distance based outlier identification methods over 4 datasets: ISBSG Release 2021, UCP, NASA93 and China. The empirical results confirm that the performance of ML based SDEE methods is generally improving by employing GSAD and the proposed GSAD method has the ability to compete with the other prevalent outlier identification methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.