Abstract
Cross-project defect prediction (CPDP) refers to predicting defects in a target project using prediction models trained from historical data of other source projects. And CPDP in the scenario where source and target projects have different metric sets is called heterogeneous defect prediction (HDP). Recently, HDP has received much research interest. Existing HDP methods only consider the linear correlation relationship among the features (metrics) of the source and target projects, and such models are insufficient to evaluate nonlinear correlation relationship among the features. So these methods may suffer from the linearly inseparable problem in the linear feature space. Furthermore, existing HDP methods do not take the class imbalance problem into consideration. Unfortunately, the imbalanced nature of software defect datasets increases the learning difficulty for the predictors. In this paper, we propose a new cost-sensitive transfer kernel canonical correlation analysis (CTKCCA) approach for HDP. CTKCCA can not only make the data distributions of source and target projects much more similar in the nonlinear feature space, where the learned features have favorable separability, but also utilize the different misclassification costs for defective and defect-free classes to alleviate the class imbalance problem. We perform the Friedman test with Nemenyi’s post-hoc statistical test and the Cliff’s delta effect size test for the evaluation. Extensive experiments on 28 public projects from five data sources indicate that: (1) CTKCCA significantly performs better than the related CPDP methods; (2) CTKCCA performs better than the related state-of-the-art HDP methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.