Combating class imbalance problem in semi-supervised defect detection

Ying Ma,Jiong Li,Guangchun Luo,Aiguo Chen

doi:10.1109/iccps.2011.6092260

Abstract

Detection of defect-prone software modules is an important topic in software quality research, and widely studied under enough defect data circumstance. An improved semi-supervised learning approach for defect detection involving class imbalanced and limited labeled data problem has been proposed. This approach employs random under-sampling technique to resample the original training set and updating training set in each round for co-train style algorithm. In comparison with conventional machine learning approaches, our method has significant superior performance in the aspect of AUC (area under the receiver operating characteristic) metric. Experimental results also show that with the proposed learning approach, it is possible to design better method to tackle the class imbalanced problem in semi-supervised learning.

Full Text