Abstract

Software defect prediction aims to predict defect-prone code regions automatically before defects are discovered. Accurate prediction helps software practitioners to prioritize their testing efforts. In recent decades, dozens of approaches have been put forward and acquired good results in this field. However, in practical scenarios, many projects have limited labeled instances; more than that, most of these labeled instances are nondefective. The lack of training data and class imbalance problem together bring serious challenges to software defect prediction tasks. So far, few of prevailing approaches can well handle these two difficulties simultaneously. One important reason is that they do not pay adequate attention to several key instances, which are difficult to classify in a small imbalanced dataset. This article introduces the concept of &#x201C;<i>instance hardness</i>&#x201D; to integrate various difficulties of imbalance classification tasks. Based on it, a novel imbalance learning framework named self-paced ensemble of ensembles (SPE<inline-formula><tex-math notation="LaTeX">$^{2}$</tex-math></inline-formula>) is proposed to perform software defect prediction. SPE<inline-formula><tex-math notation="LaTeX">$^{2}$</tex-math></inline-formula> aims to generate a strong ensemble of ensembles by self-paced harmonizing instance hardness via undersampling. Finally, SPE<inline-formula><tex-math notation="LaTeX">$^{2}$</tex-math></inline-formula> is extensively compared with eight imbalance learning approaches on ten open-source defect datasets. Experiments indicate that SPE<inline-formula><tex-math notation="LaTeX">$^{2}$</tex-math></inline-formula> improves the performance and achieves better and more significant F-measure values than its existing counterparts, based on Brunner&#x2019;s statistical significance test and Cliff&#x2019;s effect sizes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call