Abstract

Cross-project defect prediction (CPDP) refers to identifying defect-prone software modules in one project (target) using historical data collected from other projects (source), which can help developers find bugs and prioritize their testing efforts. Recently, CPDP has attracted great research interest. However, the source and target data usually exist redundancy and nonlinearity characteristics. Besides, most CPDP methods do not exploit source label information to uncover the underlying knowledge for label propagation. These factors usually lead to unsatisfactory CPDP performance. To address the above limitations, we propose a landmark selection-based kernelized discriminant subspace alignment (LSKDSA) approach for CPDP. LSKDSA not only reduces the discrepancy of the data distributions between the source and target projects, but also characterizes the complex data structures and increases the probability of linear separability of the data. Moreover, LSKDSA encodes label information of the source data into domain adaptation learning process and makes itself with good discriminant ability. Extensive experiments on 13 public projects from three benchmark datasets demonstrate that LSKDSA performs better than a range of competing CPDP methods. The improvement is 3.44%-11.23% in g-measure, 5.75%-11.76% in AUC, and 9.34%-33.63% in MCC, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call