Abstract

Different from existing cross-project defection prediction(CPDP) problems which assume that there is a close relation between the source data sets and the target data sets, in the heterogenous cross-project defection prediction(HCPDP) problem, the target data sets can be totally different from the source data sets. In order to narrow the difference between source data sets and target data sets, we implemented our own algorithm SLA + based on the selective learning algorithm . We select one of the multiple sources that have the highest similarity to the target data set as the source data set, and select one or more of the other source data sets that are similar to both the target data set and the source data set as an intermediate domain. We set up a bridge between the target domain and the source domain through the intermediate domain , breaking the large distribution gap for transferring knowledge between the source domain and the target domain. Besides, we achieve the purpose of dimensionality reduction by mining the potential relationship between features. We have done experiments on open source data sets, and the data sets used are all heterogeneous. The experiments prove that our method achieves comparable results compared with state-of-the-art HCPDP in most cases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call