Abstract
Heterogeneous defect prediction (HDP) refers to predicting defect-proneness of software modules in a target project using heterogeneous metric data from other projects. Existing HDP methods mainly focus on predicting target instances with single source. In practice, there exist plenty of external projects. Multiple sources can generally provide more information than a single project. Therefore, it is meaningful to investigate whether the HDP performance can be improved by employing multiple sources. However, a precondition of conducting HDP is that the external sources are available. Due to privacy concerns, most companies are not willing to share their data. To facilitate data sharing, it is essential to study how to protect the privacy of data owners before they release their data. In this paper, we study the above two issues in HDP. Specifically, to utilize multiple sources effectively, we propose a multi-source selection based manifold discriminant alignment (MSMDA) approach. To protect the privacy of data owners, a sparse representation based double obfuscation algorithm is designed and applied to HDP. Through a case study of 28 projects, our results show that MSMDA can achieve better performance than a range of baseline methods. The improvement is 3.4-$15.3$15.3 percent in g-measure and 3.0-$19.1$19.1 percent in AUC.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.