A New Framework Consisted of Data Preprocessing and Classifier Modelling for Software Defect Prediction

Haijin Ji,Song Huang

doi:10.1155/2018/9616938

Abstract

Different data preprocessing methods and classifiers have been established and evaluated earlier for the software defect prediction (SDP) across projects. These novel approaches have provided relatively acceptable prediction results for different software projects. However, to the best of our knowledge, few researchers have combined data preprocessing and building robust classifier simultaneously to improve prediction performances in SDP. Therefore, this paper presents a new whole framework for predicting fault-prone software modules. The proposed framework consists of instance filtering, feature selection, instance reduction, and establishing a new classifier. Additionally, we find that the 21 main software metrics commonly do follow nonnormal distribution after performing a Kolmogorov-Smirnov test. Therefore, the newly proposed classifier is built on the maximum correntropy criterion (MCC). The MCC is well-known for its effectiveness in handling non-Gaussian noise. To evaluate the new framework, the experimental study is designed with due care using nine open-source software projects with their 32 releases, obtained from the PROMISE data repository. The prediction accuracy is evaluated using F-measure. The state-of-the-art methods for Cross-Project Defect Prediction are also included for comparison. All of the evidences derived from the experimentation verify the effectiveness and robustness of our new framework.

Highlights

In software engineering, software defect prediction (SDP) refers to estimating precisely the faulty modules and helping software engineers allocate limited testing resources to the most fault-prone modules prior to the testing and maintenance phases
This paper aims to integrate the advantages of k-means and maximum correntropy criterion (MCC) to boost the potential of Cross-Project Defect Prediction (CPDP)
We propose a new CPDP called KMP, which consists of four stages including instance filtering by k-means, feature selection using correlation analysis, instance reduction based on random sampling, and building a MCC-based classifier to categorize modules of new software projects into FP class or NFP one

Summary

Introduction

Software defect prediction (SDP) refers to estimating precisely the faulty modules and helping software engineers allocate limited testing resources to the most fault-prone modules prior to the testing and maintenance phases. During the phase of predicting defects, the most important step is to establish a robust defect prediction model ( known as the defect predictor), which has motivated many researchers to design and establish various predictors which cover different aspects of software quality [1,2,3,4,5,6,7]. These proposed models, in general, could be divided into two groups: Within-Project Defect Prediction (WPDP) and Cross-Project Defect Prediction (CPDP). WPDP may be restricted to some cases where the size of historical releases is limited

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Mathematical Problems in Engineering	Publication Date: Sep 6, 2018
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A New Framework Consisted of Data Preprocessing and Classifier Modelling for Software Defect Prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematical Problems in Engineering

Lead the way for us

Similar Papers

A Decision Tree Regression based Approach for the Number of Software Faults Prediction
Santosh Singh Rathore ... Sandeep Kumar
ACM SIGSOFT Software Engineering Notes | VOL. 41
Santosh Singh Rathore, et. al.Santosh Singh Rathore ... Sandeep Kumar
22 Feb 2016
ACM SIGSOFT Software Engineering Notes | VOL. 41

Is Open-Source Software Valuable for Software Defect Prediction of Proprietary Software and Vice Versa?
Misha Kakkar ... Abhay Bansal
-
Misha Kakkar, et. al.Misha Kakkar ... Abhay Bansal
25 Nov 2017
25 Nov 2017

Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems
Santosh Singh Rathore ... Sandeep Kumar
Knowledge-Based Systems | VOL. 119
Santosh Singh Rathore, et. al.Santosh Singh Rathore ... Sandeep Kumar
21 Dec 2016
Knowledge-Based Systems | VOL. 119

Software defect prediction with semantic and structural information of codes based on Graph Neural Networks
Chunying Zhou ... Cheng Zeng
Information and Software Technology | VOL. 152
Chunying Zhou, et. al.Chunying Zhou ... Cheng Zeng
01 Dec 2022
Information and Software Technology | VOL. 152

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A New Framework Consisted of Data Preprocessing and Classifier Modelling for Software Defect Prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematical Problems in Engineering