Cross-project Prediction Research Articles

Most defect prediction methods consider a series of traditional manually designed static code metrics. However, only using these hand-crafted features is impractical. Some researchers use the Convolutional Neural Network (CNN) to capture the potential semantic information based on the program’s Syntax Trees (ASTs). In recent years, leveraging the dependency relationships between software modules to construct a software network and using network embedding models to capture the structural information have been helpful in defect prediction. This paper simultaneously takes the semantic and structural information into account and proposes a method called CGCN. This study aims to validate the feasibility and performance of the proposed method in software defect prediction. Abstract Syntax Trees and a Class Dependency Network (CDN) are first generated based on the source code. For ASTs, symbolic tokens are extracted and encoded into vectors. The numerical vectors are then used as input to the CNN to capture the semantic information. For CDN, a Graph Convolutional Network (GCN) is used to learn the structural information of the network automatically. Afterward, the learned semantic and structural information are combined with different weights. Finally, we concatenate the learned features with traditional hand-crafted features to train a classifier for more accurate defect prediction. The proposed method outperforms the state-of-the-art defect prediction models for both within-project prediction (including within-version and cross-version) and cross-project prediction on 21 open-source projects. In general, within-version prediction achieves better performance in the three prediction tasks. The proposed method of combining semantic and structural information can improve the performance of software defect prediction. • A novel defect prediction model (CGCN), which combines CNN and GCN. The proposed model can extract the semantic information from the ASTs of the source codes, as well as the structural information between modules from the software network. Therefore, the defect prediction performance is improved. • Different weight parameters are used for semantic and structural features, in order to make them play different contributions in different tasks and improve the model generalization ability. • The efficiency of CGCN is evaluated on seven Java open-source projects (each project selects three versions, with a total of 21 datasets) and three tasks (withinversion defect prediction, cross-version defect prediction and cross-project defect prediction). The results demonstrate that the CGCN variant using weight parameters outperforms the state-of-the-art methods, while the CGCN variant without using weight parameters outperforms the benchmarks only in the withinversion defect prediction. • Different from “ GCN2defect: Graph Convolutional Networks for SMOTETomekbased Software Defect Prediction ” on 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), in addition to the features of software network, this paper also combines the semantic features of source codes, and extends to cross-version and cross-project predictions. Experimental results show that the proposed method in this paper has better robustness.

Read full abstract

• We propose a two-phase transfer boosting-based cross-project prediction model. • The proposed model is assessed using non-effort based and effort-based measures. • Training data weights are based on feature importance and inter-project similarity. • We validate the effectiveness of the proposed model on a large corpus of data. Recent years have witnessed the growing trend in cross-project defect prediction (CPDP), where the training and the testing data come from different projects having different data distributions. Several CPDP methods have been presented in the literature to overcome differences in their distributions, but the majority of the existing approaches have been evaluated considering the availability of unlimited inspection effort, which is practically impossible, thus leading to fallacious conclusions. Further, they focused more on improving Recall over Precision leading to a high probability of false alarm (PF), causing significant wastage of developer's efforts and time. Addressing these issues, we propose a Two-Phase Transfer Boosting (TPTB) model, which aims at improving the performance not only in terms of non-effort based measures (NEBMs) (making a balance between Recall and PF) but also in terms of effort based measures (EBMs), considering the availability of limited inspection effort. To mitigate the distribution differences, the first phase assigns initial weights to the training modules based on the feature distribution and feature importance. The second phase applies the Dynamic Transfer AdaBoost algorithm to build an ensemble classifier to lessen the impact of contradictory training modules. In addition, a sorting strategy is designed to prioritize the modules for further inspection. Statistical results on 62 datasets revealed a better-balanced performance of our TPTB model holistically over NN-filter, ManualDown, EASC, and Cruz model with performance comparable to WPDP (Within-project defect prediction) considering NEBMs. Besides, when considering EBMs together, TPTB showed statistically and practically more balanced performance as compared to ManualUP and Cruz with overall performance comparable to EASC. Our results demonstrate the efficacy of the TPTB model in a practical setting empowering the quality assurance team to predict and prioritize the defective modules allocating limited inspection effort by optimally focusing on highly defective modules.

Read full abstract

Cross-project Prediction Research Articles

Related Topics

Articles published on Cross-project Prediction

KCO: Balancing class distribution in just-in-time software defect prediction using kernel crossover oversampling.

Automatic prediction of developers’ resolutions for software merge conflicts

SMOTE-Based Homogeneous Prediction for Aging-Related Bugs in Cloud-Oriented Software

Cross-project prediction for rock mass using shuffled TBM big dataset and knowledge-based machine learning methods

Software defect prediction with semantic and structural information of codes based on Graph Neural Networks

Towards building a pragmatic cross-project defect prediction model combining non-effort based and effort-based performance measures for a balanced evaluation

Are Source Code Metrics “Good Enough” in Predicting Security Vulnerabilities?

Security versus Compliance: An Empirical Study of the Impact of Industry Standards Compliance on Application Security

Local modeling approach for cross-project defect prediction

Deep Cross-Project Software Reliability Growth Model Using Project Similarity-Based Clustering

Why do builds fail?—A conceptual replication study

Within-project and cross-project software defect prediction based on improved transfer Naive Bayes algorithm

A systematic review of unsupervised learning techniques for software defect prediction

An empirical study of factors affecting cross-project aging-related bug prediction with TLAP

Substantiation of multinomial classification using ensemble learning approach

Cross-project bug type prediction based on transfer learning

Neural Network-based Detection of Self-Admitted Technical Debt

Collective transfer learning for defect prediction

Whom are you going to call? determinants of @-mentions in Github discussions

Too trivial to test? An inverse view on defect prediction to identify methods with low fault risk.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Cross-project Prediction Research Articles

Related Topics

Articles published on Cross-project Prediction

KCO: Balancing class distribution in just-in-time software defect prediction using kernel crossover oversampling.

Automatic prediction of developers’ resolutions for software merge conflicts

SMOTE-Based Homogeneous Prediction for Aging-Related Bugs in Cloud-Oriented Software

Cross-project prediction for rock mass using shuffled TBM big dataset and knowledge-based machine learning methods

Software defect prediction with semantic and structural information of codes based on Graph Neural Networks

Towards building a pragmatic cross-project defect prediction model combining non-effort based and effort-based performance measures for a balanced evaluation

Are Source Code Metrics “Good Enough” in Predicting Security Vulnerabilities?

Security versus Compliance: An Empirical Study of the Impact of Industry Standards Compliance on Application Security

Local modeling approach for cross-project defect prediction

Deep Cross-Project Software Reliability Growth Model Using Project Similarity-Based Clustering

Why do builds fail?—A conceptual replication study

Within-project and cross-project software defect prediction based on improved transfer Naive Bayes algorithm

A systematic review of unsupervised learning techniques for software defect prediction

An empirical study of factors affecting cross-project aging-related bug prediction with TLAP

Substantiation of multinomial classification using ensemble learning approach

Cross-project bug type prediction based on transfer learning

Neural Network-based Detection of Self-Admitted Technical Debt

Collective transfer learning for defect prediction

Whom are you going to call? determinants of @-mentions in Github discussions

Too trivial to test? An inverse view on defect prediction to identify methods with low fault risk.