The performance of software defect prediction (SDP) models depend on the quality of considered software features. Redundant features and irrelevant features may reduce the performance of the constructed models, which require feature selection methods to identify and remove them. Previous studies mostly treat feature selection as a single objective optimization problem, and multi-objective feature selection for SDP has not been thoroughly investigated. In this paper, we propose a novel method MOFES (Multi-Objective FEature Selection), which takes two optimization objectives into account. One optimization objective is to minimize the number of selected features, this objective is related to the cost analysis of this problem. Another objective is to maximize the performance of the constructed SDP models, this objective is related to the benefit analysis of this problem. MOFES utilizes Pareto based multi-objective optimization algorithms (PMAs) to solve this problem. In our empirical study, we design and conduct experiments on RELINK and PROMISE datasets, which are gathered from real open source projects. Firstly, we analyze the influence of different PMAs on MOFES and find that NSGA-II can achieve the best performance on both datasets. Then, we compare MOFES method with 22 state-of-the-art filter based and wrapper based feature selection methods, and find that MOFES can effectively select fewer but closely related features to construct high-quality models. Moreover, we also analyze the frequently selected features by MOFES, and these findings can be used to provide guidelines on gathering high-quality SDP datasets. Finally, we analyze the computational cost of MOFES and find that MOFES only needs 107 seconds on average.
Read full abstract