Semantic and traditional feature fusion for software defect prediction using hybrid deep learning model

Ahmed Abdu,Zhengjun Zhai,Hakim A Abdo,Redhwan Algabri,Mohammed A Al-Masni,Mannan Saeed Muhammad,Yeong Hyeon Gu

doi:10.1038/s41598-024-65639-4

Ahmed Abdu, Zhengjun Zhai + Show 5 more

PDF Available

https://doi.org/10.1038/s41598-024-65639-4

Copy DOI

Export

Save

Cite

Journal: Scientific Reports	Publication Date: Jul 1, 2024
Citations: 3	License type: CC BY 4.0

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Software defect prediction aims to find a reliable method for predicting defects in a particular software project and assisting software engineers in allocating limited resources to release high-quality software products. While most earlier research has concentrated on employing traditional features, current methodologies are increasingly directed toward extracting semantic features from source code. Traditional features often fall short in identifying semantic differences within programs, differences that are essential for the development of reliable and effective prediction models. In contrast, semantic features cannot present statistical metrics about the source code, such as the code size and complexity. Thus, using only one kind of feature negatively affects prediction performance. To bridge the gap between the traditional and semantic features, we propose a novel defect prediction model that integrates traditional and semantic features using a hybrid deep learning approach to address this limitation. Specifically, our model employs a hybrid CNN-MLP classifier: the convolutional neural network (CNN) processes semantic features extracted from projects’ abstract syntax trees (ASTs) using Word2vec. In contrast, the traditional features extracted from the dataset repository are processed by a multilayer perceptron (MLP). Outputs of CNN and MLP are then integrated and fed into a fully connected layer for defect prediction. Extensive experiments are conducted on various open-source projects to validate CNN-MLP’s effectiveness. Experimental results indicate that CNN-MLP can significantly enhance defect prediction performance. Furthermore, CNN-MLP’s improvements outperform existing methods in non-effort-aware and effort-aware cases.

Full Text