Software Defect Prediction Research Articles

AbstractSoftware cross-project defect prediction (CPDP) makes use of cross-project (CP) data to overcome the lack of data necessary to train well-performing software defect prediction (SDP) classifiers in the early stage of new software projects. Since the CP data (known as the source) may be different from the new project’s data (known as the target), this makes it difficult for CPDP classifiers to perform well. In particular, it is a mismatch of data distributions between source and target that creates this difficulty. Transfer learning-based CPDP classifiers are designed to minimize these distribution differences. The first Transfer learning-based CPDP classifiers treated these differences equally, thereby degrading prediction performance. To this end, recent research has the Weighted Balanced Distribution Adaptation (W-BDA) method to leverage the importance of both distribution differences to improve classification performance. Although W-BDA has been shown to improve model performance in CPDP and tackle the class imbalance by balancing the class proportion of each domain, research to date has failed to consider model performance in light of increasing target data. We provide the first investigation studying the effects of increasing the target data when leveraging the importance of both distribution differences. We extend the initial W-BDA method and call this extension the W-BDA$$\mathbf {^{+}}$$ + method. To evaluate the effectiveness of W-BDA$$\mathbf {^{+}}$$ + for improving CPDP performance, we conduct eight experiments on 18 projects from four datasets, where data sampling was performed with different sampling methods. Data sampling was only performed on the baseline methods and not on our proposed W-BDA$$\mathbf {^{+}}$$ + and the original W-BDA because data sampling issues do not exist for these two methods. We evaluate our method using four complementary indicators (i.e., Balanced Accuracy, AUC, F-measure and G-Measure). Our findings reveal an average improvement of 6%, 7.5%, 10% and 12% for these four indicators when W-BDA$$\mathbf {^{+}}$$ + is compared to the original W-BDA and five other baseline methods (for all four of the sampling methods used). Also, as the target to source ratio is increased with different sampling methods, we observe a decrease in performance for the original W-BDA, with our W-BDA$$\mathbf {^{+}}$$ + approach outperforming the original W-BDA in most cases. Our results highlight the importance of having an awareness of the effect of the increasing availability of target data in CPDP scenarios when using a method that can handle the class imbalance problem.

Read full abstract

PurposeSoftware defect prediction (SDP) is a critical aspect of software quality assurance, aiming to identify and manage potential defects in software systems. In this paper, we have proposed a novel hybrid approach that combines Grey Wolf Optimization with Feature Selection (GWOFS) and multilayer perceptron (MLP) for SDP. The GWOFS-MLP hybrid model is designed to optimize feature selection, ultimately enhancing the accuracy and efficiency of SDP. Grey Wolf Optimization, inspired by the social hierarchy and hunting behavior of grey wolves, is employed to select a subset of relevant features from an extensive pool of potential predictors. This study investigates the key challenges that traditional SDP approaches encounter and proposes promising solutions to overcome time complexity and the curse of the dimensionality reduction problem.Design/methodology/approachThe integration of GWOFS and MLP results in a robust hybrid model that can adapt to diverse software datasets. This feature selection process harnesses the cooperative hunting behavior of wolves, allowing for the exploration of critical feature combinations. The selected features are then fed into an MLP, a powerful artificial neural network (ANN) known for its capability to learn intricate patterns within software metrics. MLP serves as the predictive engine, utilizing the curated feature set to model and classify software defects accurately.FindingsThe performance evaluation of the GWOFS-MLP hybrid model on a real-world software defect dataset demonstrates its effectiveness. The model achieves a remarkable training accuracy of 97.69% and a testing accuracy of 97.99%. Additionally, the receiver operating characteristic area under the curve (ROC-AUC) score of 0.89 highlights the model’s ability to discriminate between defective and defect-free software components.Originality/valueExperimental implementations using machine learning-based techniques with feature reduction are conducted to validate the proposed solutions. The goal is to enhance SDP’s accuracy, relevance and efficiency, ultimately improving software quality assurance processes. The confusion matrix further illustrates the model’s performance, with only a small number of false positives and false negatives.

Read full abstract

Software Defect Prediction Research Articles

Related Topics

Articles published on Software Defect Prediction

RETRACTED: Hybridization of fuzzy rough feature selection with ANFIS and turbulent flow of water optimization for managing software defect prediction uncertainty

Predicting the Number of Software Faults using Deep Learning

Insights of effectivity analysis of learning-based approaches towards software defect prediction

Improving transfer learning for software cross-project defect prediction

An optimized deep learning method for software defect prediction using Whale Optimization Algorithm

Enhancing Software Defect Prediction accuracy using Modified Entropy Calculation in Random Forest Algorithm

Effect of Data Sampling on Cone Shaped Embedded Normalization in Just in Time Software Defect Prediction

Comparative Study of Various Hyperparameter Tuning on Random Forest Classification With SMOTE and Feature Selection Using Genetic Algorithm in Software Defect Prediction

A hybrid approach for optimizing software defect prediction using a grey wolf optimization and multilayer perceptron

An empirical study of just-in-time-defect prediction using various machine learning techniques

Efficient Cross-Project Software Defect Prediction Based on Federated Meta-Learning

An empirically based object-oriented testing using Machine learning

Neighbor cleaning learning based cost‐sensitive ensemble learning approach for software defect prediction

Application of Weighted Combinations of Activation Functions to Defect Prediction in Software Development

Ensemble Kernel-Mapping-Based Ranking Support Vector Machine for Software Defect Prediction

Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning

Alleviating Class Imbalance Issue in Software Fault Prediction Using DBSCAN-Based Induced Graph Under-Sampling Method

Hybrid deep architecture for software defect prediction with improved feature set

Interpretable Software Defect Prediction from Project Effort and Static Code Metrics

Development of Honey Badger-Cat Swarm Optimisation-Based Parallel Cascaded Deep Network for Software Bug Prediction Framework

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Software Defect Prediction Research Articles

Related Topics

Articles published on Software Defect Prediction

RETRACTED: Hybridization of fuzzy rough feature selection with ANFIS and turbulent flow of water optimization for managing software defect prediction uncertainty

Predicting the Number of Software Faults using Deep Learning

Insights of effectivity analysis of learning-based approaches towards software defect prediction

Improving transfer learning for software cross-project defect prediction

An optimized deep learning method for software defect prediction using Whale Optimization Algorithm

Enhancing Software Defect Prediction accuracy using Modified Entropy Calculation in Random Forest Algorithm

Effect of Data Sampling on Cone Shaped Embedded Normalization in Just in Time Software Defect Prediction

Comparative Study of Various Hyperparameter Tuning on Random Forest Classification With SMOTE and Feature Selection Using Genetic Algorithm in Software Defect Prediction

A hybrid approach for optimizing software defect prediction using a grey wolf optimization and multilayer perceptron

An empirical study of just-in-time-defect prediction using various machine learning techniques

Efficient Cross-Project Software Defect Prediction Based on Federated Meta-Learning

An empirically based object-oriented testing using Machine learning

Neighbor cleaning learning based cost‐sensitive ensemble learning approach for software defect prediction

Application of Weighted Combinations of Activation Functions to Defect Prediction in Software Development

Ensemble Kernel-Mapping-Based Ranking Support Vector Machine for Software Defect Prediction

Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning

Alleviating Class Imbalance Issue in Software Fault Prediction Using DBSCAN-Based Induced Graph Under-Sampling Method

Hybrid deep architecture for software defect prediction with improved feature set

Interpretable Software Defect Prediction from Project Effort and Static Code Metrics

Development of Honey Badger-Cat Swarm Optimisation-Based Parallel Cascaded Deep Network for Software Bug Prediction Framework