Abstract

Cross-Project Defect Prediction (CPDP) aims to leverage the knowledge from label-rich source software projects to promote tasks in a label-poor target software project. Existing CPDP methods have two major flaws. One is that previous CPDP methods only consider global feature representation and ignores local relationship between instances in the same category from different projects, resulting in ambiguous predictions near the decision boundary. The other one is that CPDP methods based on pseudo-labels assume that the conditional distribution can be well matched at one stroke, when instances of target project are correctly annotated pseudo labels. However, due to the great gap between projects, the pseudo-labels seriously deviate from the real labels. To address above issues, this paper proposed a novel CPDP method named Joint Feature Representation with Double Marginalized Denoising Autoencoders (DMDA_JFR). Our method mainly includes two parts: joint feature representation learning and progressive distribution matching. We utilize two novel autoencoders to jointly learn the global and local feature representations simultaneously. To achieve progressive distribution matching, we introduce a repetitious pseudo-labels strategy, which makes it possible that distributions are matched after each stack layer learning rather than in one stroke. The effectiveness of the proposed method was evaluated through experiments conducted on 10 open-source projects, including 29 software releases from PROMISE repository. Overall, experimental results show that our proposed method outperformed several state-of-the-art baseline CPDP methods. It can be concluded that (1) joint deep representations are promising for CPDP compared with only considering global feature representation methods, (2) progressive distribution matching is more effective for adapting probability distributions in CPDP compared with existing CPDP methods based on pseudo-labels. • This paper proposed a novel CPDP method named Joint Feature Representation with Double Marginalized Denoising Autoencoders (DMDA-JFR). • Our method mainly includes two parts: joint feature representation learning and progressive distribution matching. • We utilize two novel autoencoders to jointly learn the global and local feature representations simultaneously. • We introduce a repetitious pseudo-labels strategy to progressively match distribution.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.