Abstract

Different data preprocessing methods and classifiers have been established and evaluated earlier for the software defect prediction (SDP) across projects. These novel approaches have provided relatively acceptable prediction results for different software projects. However, to the best of our knowledge, few researchers have combined data preprocessing and building robust classifier simultaneously to improve prediction performances in SDP. Therefore, this paper presents a new whole framework for predicting fault-prone software modules. The proposed framework consists of instance filtering, feature selection, instance reduction, and establishing a new classifier. Additionally, we find that the 21 main software metrics commonly do follow nonnormal distribution after performing a Kolmogorov-Smirnov test. Therefore, the newly proposed classifier is built on the maximum correntropy criterion (MCC). The MCC is well-known for its effectiveness in handling non-Gaussian noise. To evaluate the new framework, the experimental study is designed with due care using nine open-source software projects with their 32 releases, obtained from the PROMISE data repository. The prediction accuracy is evaluated using F-measure. The state-of-the-art methods for Cross-Project Defect Prediction are also included for comparison. All of the evidences derived from the experimentation verify the effectiveness and robustness of our new framework.

Highlights

  • In software engineering, software defect prediction (SDP) refers to estimating precisely the faulty modules and helping software engineers allocate limited testing resources to the most fault-prone modules prior to the testing and maintenance phases

  • This paper aims to integrate the advantages of k-means and maximum correntropy criterion (MCC) to boost the potential of Cross-Project Defect Prediction (CPDP)

  • We propose a new CPDP called KMP, which consists of four stages including instance filtering by k-means, feature selection using correlation analysis, instance reduction based on random sampling, and building a MCC-based classifier to categorize modules of new software projects into FP class or NFP one

Read more

Summary

Introduction

Software defect prediction (SDP) refers to estimating precisely the faulty modules and helping software engineers allocate limited testing resources to the most fault-prone modules prior to the testing and maintenance phases. During the phase of predicting defects, the most important step is to establish a robust defect prediction model ( known as the defect predictor), which has motivated many researchers to design and establish various predictors which cover different aspects of software quality [1,2,3,4,5,6,7]. These proposed models, in general, could be divided into two groups: Within-Project Defect Prediction (WPDP) and Cross-Project Defect Prediction (CPDP). WPDP may be restricted to some cases where the size of historical releases is limited

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call