Abstract

Finding defects early in a software system is a crucial task, as it creates adequate time for fixing such defects using available resources. Strategies such as symmetric testing have proven useful; however, its inability in differentiating incorrect implementations from correct ones is a drawback. Software defect prediction (SDP) is another feasible method that can be used for detecting defects early. Additionally, high dimensionality, a data quality problem, has a detrimental effect on the predictive capability of SDP models. Feature selection (FS) has been used as a feasible solution for solving the high dimensionality issue in SDP. According to current literature, the two basic forms of FS approaches are filter-based feature selection (FFS) and wrapper-based feature selection (WFS). Between the two, WFS approaches have been deemed to be superior. However, WFS methods have a high computational cost due to the unknown number of executions available for feature subset search, evaluation, and selection. This characteristic of WFS often leads to overfitting of classifier models due to its easy trapping in local maxima. The trapping of the WFS subset evaluator in local maxima can be overcome by using an effective search method in the evaluator process. Hence, this study proposes an enhanced WFS method that dynamically and iteratively selects features. The proposed enhanced WFS (EWFS) method is based on incrementally selecting features while considering previously selected features in its search space. The novelty of EWFS is based on the enhancement of the subset evaluation process of WFS methods by deploying a dynamic re-ranking strategy that iteratively selects germane features with a low subset evaluation cycle while not compromising the prediction performance of the ensuing model. For evaluation, EWFS was deployed with Decision Tree (DT) and Naïve Bayes classifiers on software defect datasets with varying granularities. The experimental findings revealed that EWFS outperformed existing metaheuristics and sequential search-based WFS approaches established in this work. Additionally, EWFS selected fewer features with less computational time as compared with existing metaheuristics and sequential search-based WFS methods.

Highlights

  • The software development lifecycle (SDLC) is a formal framework that has been planned and built for the production or development of high-quality software systems

  • Performance metrics such as accuracy, area under the curve (AUC), and f-measure values were used to test the predictive efficiency of Naive Bayes (NB) and Decision Tree (DT) classifiers based on proposed enhanced WFS (EWFS), metaheuristic search-based wrapper-based feature selection (WFS), and sequential search-based WFS

  • DT) based on EWFS have a high chance of correctly predicting the average defects in Software defect prediction (SDP), which translates to a good prediction performance of models based on EWFS

Read more

Summary

Introduction

The software development lifecycle (SDLC) is a formal framework that has been planned and built for the production or development of high-quality software systems. These errors will result in unreliable computing structures and, software failure. The occurrence of errors in information system modules or components will result in flawed and low-quality software systems. It is important to consider early prediction and recognition of software flaws before product delivery or during the software development processes. Detection or prediction of incorrect (defective) modules or components in a software system allows those modules or components to be automatically corrected and available resources to be used wisely [8,9]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.