Abstract

BACKGROUND: The availability of defect related data of different projects leads to cross project defect prediction an open issue. Many studies have focused on analyzing and improving the performance of Cross project defect prediction. OBJECTIVE: The multinomial classification has not been much explored. This paper instanced on multiclass/multinomial classification of defect prediction of cross projects. METHOD: The ensemble based statistical models – Gradient Boosting and Random Forest are used for classification. An empirical study is carried out to determine the performance of multinomial classification for cross project defect prediction. Depending on the number of defects, class level information is classified into one of three defined multiclass class 0, class 1, and class 2. RESULTS & CONCLUSION: Major outcome of the paper concludes that multinomial/multiclass classification is applicable on cross project data and has comparable results to within project defect data.

Highlights

  • Identification of the defect prone classes before actual testing reduces the testing cost and efforts

  • End for # Defining the training and the testing data for Cross Project Defect Prediction for project belongs to CP_test CP_train = CP_data – data CP_test= data # Defining the training and the testing data for Within Project Defect Prediction for project belongs to WP_test splitting training and testing data with 60:40 ratio train = {selecting 60% of data us } test = training_data – train #Modeling applying machine learning algorithms model1 = random_forest_model_training(train) model2 = gradient_boosting_model_training(train) Applying Grid Search technique to tune the hyper-parameters for both models Predicting the results on test data Conversion of 10-class to 3-class and re-train the models again with above steps Evaluating the models using metrics {auc,precision,recall, f1score}

  • In this paper we analyzed the performance in multinomial classification of cross and within project defect prediction

Read more

Summary

OBJECTIVE

The multinomial classification has not been much explored. This paper instanced on multiclass/multinomial classification of defect prediction of cross projects. METHOD: The ensemble based statistical models – Gradient Boosting and Random Forest are used for classification. An empirical study is carried out to determine the performance of multinomial classification for cross project defect prediction. Depending on the number of defects, class level information is classified into one of three defined multiclass class 0, class 1, and class 2. RESULTS & CONCLUSION: Major outcome of the paper concludes that multinomial/multiclass classification is applicable on cross project data and has comparable results to within project defect data. Received on 10 May 2019, accepted on 27 August 2019, published on 09 September 2019

Introduction
State Of Art
Metrics used and description of Datasets
Ensemble Learning Models
Evaluation Measures
Data Preprocessing and Preparation
Model Fitting
Model Evaluation
Performance Results
Results & Discussion
Threats To Validity
Conclusion & Future Scope
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call