Abstract

High dimensionality is a data quality problem that negatively influences the predictive capabilities of prediction models in software defect prediction (SDP). As a viable solution, feature selection (FS) has been used to address the high dimensionality problem in SDP. From existing studies, Filter-based feature selection (FFS) and Wrapper Feature Selection (WFS) are the two basic types of FS methods. WFS methods have been regarded to have superior performance between the two. However, WFS methods have been known to have high computational cost as the number of executions required for feature subset search, evaluation and selection is not known prior. This often leads to overfitting of prediction models due to easy trapping in local maxima. Applying appropriate search method in WFS subset evaluator phase can resolve its trapping in local maxima. Best First Search (BFS) and Greedy Step-wise Search (GSS) methods have been extensively and conventionally used as viable search methods in WFS with positive impacts. However, metaheuristic search methods can also be as effective as BFS and GSS. Consequently, this study conducts an empirical comparative analysis of 13 search methods (11 state-of-the-art metaheuristic search and 2 conventional search methods) in WFS methods for SDP. The experimental results showed that metaheuristic (AS, BS, BAT, CS, ES, FS, FLS, GS, NSGA-II, PSOS, RS) as search methods in WFS proved to be better than conventional search methods (BFS and GSS). Although the average computational time of metaheuristic-based WFS methods is relatively high. We recommend that metaheuristic search can be used as alternate search methods for WFS methods in SDP.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call