Abstract
Unsupervised Effort-Aware Defect Prediction (EADP) uses unlabeled data to construct a model and ranks software modules according to the software feature values. Xu et al. (JSS 2021) conducted an exploration of clustering techniques for unsupervised defect prediction and found that several clustering methods exhibit better performance on the F1@20% effort-aware metric. However, their conclusion may not be convincing, as they did not take into account the impact of the Initial False Alarms (IFA) metric on unsupervised EADP. Furthermore, their study did not compare with the state-of-the-art supervised EADP models. To further investigate clustering techniques for unsupervised EADP more comprehensively, we explore the performance of 22 clustering techniques for unsupervised EADP using three classification metrics and six effort-aware metrics. The experimental results demonstrate that (1) the best clustering technique for unsupervised EADP, K-medoids, can significantly reduce the IFA of the ManualUp method to an acceptable range. In contrast, the clustering techniques recommended by Xu et al. exhibit a high IFA value that cannot be deemed acceptable by testing teams; (2) K-medoids performs better than some supervised EADP methods, especially on metrics such as IFA and PMI@20% (Proportion of Modules Inspected when inspecting the top 20% lines of code); (3) better classification performance of clustering techniques could lead to better effort-aware performance. In summary, we recommend using the K-medoids clustering technique for unsupervised EADP and suggest that future research devote more effort to exploring better-unsupervised clustering techniques. In support of reproducibility and future research, we provide the source code used in our study (https://github.com/Andre-Yang816/Clustering4UEADP).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.