A Hierarchical Feature Ensemble Deep Learning Approach for Software Defect Prediction

Yue Yan,Shujuan Jiang,Shenggang Zhang

doi:10.1142/s0218194023500079

Abstract

Software defect prediction can detect modules that may have defects in advance and optimize resource allocation to improve test efficiency and reduce development costs. Traditional features cannot capture deep semantic and grammatical information, which limits the further development of software defect prediction. Therefore, it has gradually become a trend to use deep learning technology to automatically learn valuable deep features from source code or relevant data. However, most software defect prediction methods based on deep learning extraction features from a single information source or only use a single deep learning model, which leads to the fact that the extracted features are not comprehensive enough to affect the final prediction performance. In view of this, this paper proposes a Hierarchical Feature Ensemble Deep Learning (HFEDL) Approach for software defect prediction. Firstly, the HFEDL approach needs to obtain three types of information sources: abstract syntax tree (AST), class dependency network (CDN) and traditional features. Then, the Convolutional Neural Network (CNN) and the Bidirectional Long Short-Term Memory based on Attention mechanism (BiLSTM+Attention) are used to extract different valuable features from the three information sources and multiple prediction sub-models are constructed. Next, all the extracted features are fused by a filter mechanism to obtain more comprehensive features and construct a fusion prediction sub-model. Finally, all the sub-models are integrated by an ensemble learning method to obtain the final prediction model. We use 11 projects in the PROMISE defect repository and evaluate our approach in both non-effort-aware and effort-aware scenarios. The experimental results show that the prediction performance of our approach is superior to state-of-the-art methods in both scenarios.

Full Text