Abstract
Various software metrics and statistical models have been developed to help companies to predict software defects. Traditional software defect prediction approaches use historical data about previous bugs on a project in order to build predictive machine learning models. However, in many cases the historical testing data available in a project is scarce, i.e., very few or even no labeled training instances are available, which will result on a low quality defect prediction model. In order to overcome this limitation, Cross-Project Defect Prediction (CPDP) can be adopted to learn a defect prediction model for a project of interest (i.e., a target project) by reusing (transferring) data collected from several previous projects (i.e., source projects). In this paper, we focused on neighborhood-based instance selection techniques for CPDP which select labeled instances in the source projects that are similar to the unlabeled instances available in the target project. Despite its simplicity, these techniques have limitations which were addressed in our work. First, although they can select representative source instances, the quality of the selected instances is usually not addressed. Additionally, bug prediction datasets are normally unbalanced (i.e., there are more nondefect instances than defect ones), which can harm learning performance. In this paper, we proposed a new transfer learning approach for CPDP, in which instances selected by a neighborhood-based technique are filtered by the FuzzyRough Instance Selection (FRIS) technique in order to remove noisy instances in the training set. Following, in order to solve class balancing problems, the Synthetic Minority Oversampling Technique (SMOTE) technique is adopted to oversample the minority (defect-prone) class, thus increasing the chance of finding bugs correctly. Experiments were performed on a benchmark set of Java projects, achieving promising results.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.