PROMISE Repository Research Articles

With the increasing number of software projects, within-project defect prediction (WPDP) has already been unable to meet the demand, and cross-project defect prediction (CPDP) is playing an increasingly significant role in the area of software engineering. The classic CPDP methods mainly concentrated on applying metric features to predict defects. However, these approaches failed to consider the rich semantic information, which usually contains the relationship between software defects and context. Since traditional methods are unable to exploit this characteristic, their performance is often unsatisfactory. In this paper, a transfer long short-term memory (TLSTM) network model is first proposed. Transfer semantic features are extracted by adding a transfer learning algorithm to the long short-term memory (LSTM) network. Then, the traditional metric features and semantic features are combined for CPDP. First, the abstract syntax trees (AST) are generated based on the source codes. Second, the AST node contents are converted into integer vectors as inputs to the TLSTM model. Then, the semantic features of the program can be extracted by TLSTM. On the other hand, transferable metric features are extracted by transfer component analysis (TCA). Finally, the semantic features and metric features are combined and input into the logical regression (LR) classifier for training. The presented TLSTM model performs better on the f-measure indicator than other machine and deep learning models, according to the outcomes of several open-source projects of the PROMISE repository. The TLSTM model built with a single feature achieves 0.7% and 2.1% improvement on Log4j-1.2 and Xalan-2.7, respectively. When using combined features to train the prediction model, we call this model a transfer long short-term memory for defect prediction (DPTLSTM). DPTLSTM achieves a 2.9% and 5% improvement on Synapse-1.2 and Xerces-1.4.4, respectively. Both prove the superiority of the proposed model on the CPDP task. This is because LSTM capture long-term dependencies in sequence data and extract features that contain source code structure and context information. It can be concluded that: (1) the TLSTM model has the advantage of preserving information, which can better retain the semantic features related to software defects; (2) compared with the CPDP model trained with traditional metric features, the performance of the model can validly enhance by combining semantic features and metric features.

AbstractSoftware defects are a critical issue in software development that can lead to system failures and cause significant financial losses. Predicting software defects is a vital aspect of ensuring software quality. This can significantly impact both saving time and reducing the overall cost of software testing. During the software defect prediction (SDP) process, automated tools attempt to predict defects in the source codes based on software metrics. Several SDP models have been proposed to identify and prevent defects before they occur. In recent years, recurrent neural network (RNN) techniques have gained attention for their ability to handle sequential data and learn complex patterns. Still, these techniques are not always suitable for predicting software defects due to the problem of imbalanced data. To deal with this problem, this study aims to combine a bidirectional long short-term memory (Bi-LSTM) network with oversampling techniques. To establish the effectiveness and efficiency of the proposed model, the experiments have been conducted on benchmark datasets obtained from the PROMISE repository. The experimental results have been compared and evaluated in terms of accuracy, precision, recall, f-measure, Matthew’s correlation coefficient (MCC), the area under the ROC curve (AUC), the area under the precision-recall curve (AUCPR) and mean square error (MSE). The average accuracy of the proposed model on the original and balanced datasets (using random oversampling and SMOTE) was 88%, 94%, And 92%, respectively. The results showed that the proposed Bi-LSTM on the balanced datasets (using random oversampling and SMOTE) improves the average accuracy by 6 and 4% compared to the original datasets. The average F-measure of the proposed model on the original and balanced datasets (using random oversampling and SMOTE) were 51%, 94%, And 92%, respectively. The results showed that the proposed Bi-LSTM on the balanced datasets (using random oversampling and SMOTE) improves the average F-measure by 43 and 41% compared to the original datasets. The experimental results demonstrated that combining the Bi-LSTM network with oversampling techniques positively affects defect prediction performance in datasets with imbalanced class distributions.

PROMISE Repository Research Articles

Related Topics

Articles published on PROMISE Repository

A hybrid‐ensemble model for software defect prediction for balanced and imbalanced datasets using AI‐based techniques with feature preservation: SMERKP‐XGB

Bioprospecting of Aspergillus sp. as a promising repository for anti-cancer agents: a comprehensive bibliometric investigation.

Predicting the Number of Software Faults using Deep Learning

Cross-Project Defect Prediction Using Transfer Learning with Long Short-Term Memory Networks

A trustworthy hybrid model for transparent software defect prediction: SPAM-XAI.

Anticancer, Immunomodulatory, and Phytochemical Screening of Carthamus oxyacantha M.Bieb Growing in the North of Iraq.

Ensemble learning based software defect prediction

Software defect prediction using a bidirectional LSTM network combined with oversampling techniques

Evolutionary measures and their correlations with the performance of cross‐version defect prediction for object‐oriented projects

A Novel Developed Supervised Machine Learning System For Classification And Prediction of Software Faults Using NASA Dataset

Software Testing: A Prediction Techniques

A novel approach for software defect prediction using CNN and GRU based on SMOTE Tomek method

Machine learning based Software Fault Prediction models

Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning.

Performance Evaluation of Convolutional Neural Network for Multi-Class in Cross Project Defect Prediction

Design of a Hybrid Machine Learning Base-Classifiers for Software Defect Prediction

A cognitive and neural network approach for software defect prediction

An Approach to Software Defect Prediction Combining Semantic Features and Code Changes

Using Cost-cognitive Bagging Ensemble to Improve Cross-project Defects Prediction

Feature Clustering and Ensemble Learning Based Approach for Software Defect Prediction

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

PROMISE Repository Research Articles

Related Topics

Articles published on PROMISE Repository

A hybrid‐ensemble model for software defect prediction for balanced and imbalanced datasets using AI‐based techniques with feature preservation: SMERKP‐XGB

Bioprospecting of Aspergillus sp. as a promising repository for anti-cancer agents: a comprehensive bibliometric investigation.

Predicting the Number of Software Faults using Deep Learning

Cross-Project Defect Prediction Using Transfer Learning with Long Short-Term Memory Networks

A trustworthy hybrid model for transparent software defect prediction: SPAM-XAI.

Anticancer, Immunomodulatory, and Phytochemical Screening of Carthamus oxyacantha M.Bieb Growing in the North of Iraq.

Ensemble learning based software defect prediction

Software defect prediction using a bidirectional LSTM network combined with oversampling techniques

Evolutionary measures and their correlations with the performance of cross‐version defect prediction for object‐oriented projects

A Novel Developed Supervised Machine Learning System For Classification And Prediction of Software Faults Using NASA Dataset

Software Testing: A Prediction Techniques

A novel approach for software defect prediction using CNN and GRU based on SMOTE Tomek method

Machine learning based Software Fault Prediction models

Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning.

Performance Evaluation of Convolutional Neural Network for Multi-Class in Cross Project Defect Prediction

Design of a Hybrid Machine Learning Base-Classifiers for Software Defect Prediction

A cognitive and neural network approach for software defect prediction

An Approach to Software Defect Prediction Combining Semantic Features and Code Changes

Using Cost-cognitive Bagging Ensemble to Improve Cross-project Defects Prediction

Feature Clustering and Ensemble Learning Based Approach for Software Defect Prediction