Abstract

It is a critical quality assurance activity to effectively detect the root cause of faults causing the software crashes (i.e. crashing faults). Previous studies extracted features to characterise crash instances and built models to identify whether the residences of crashing faults locate inside the stack traces. These models all belong to supervised learning methods which require labelled crash data to be involved. In this study, the introduction of an unsupervised model, called Transfer Spectral Clustering (TSC), for the task of crashing fault residence identification under the unlabelled data scenario is proposed. Unlike traditional unsupervised methods which are applied to individual project data, TSC transfers the knowledge of auxiliary unlabelled data from the source project to assist the clustering task on the unlabelled data from the target project. TSC is an unsupervised transfer learning method, and simultaneously considers the data manifold information of the individual project and feature manifold information across projects to facilitate the clustering effect. Extensive experiments are conducted on a benchmark dataset containing seven software projects. Five indicators were chosen for performance evaluation. The results show that TSC achieves better performance than four clustering based unsupervised methods, and competitive performance compared with eight supervised cross-project methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call