Abstract
The objective of cross-project defect prediction (CPDP) is to develop a model that trains bugs on current source projects and predicts defects of target projects. Due to the complexity of projects, CPDP is a challenging task, and the precision estimated is not always trustworthy. Our goal is to predict the bugs in the new projects by training our model on the current projects for cross-projects to save time, cost, and effort. We used experimental research and the type of research is explanatory. Our research method is controlled experimentation, for which our independent variable is prediction accuracy and dependent variables are hyper-parameters which include learning rate, epochs, and dense layers of neural networks. Our research approach is quantitative as the dataset is quantitative. The design of our research is 1F1T (1 factor and 1 treatment). To obtain the results, we first carried out exploratory data analysis (EDA). Using EDA, we found that the dataset is multi-class. The dataset contains 11 different projects consisting of 28 different versions of all the projects in total. We also found that the dataset has significant issues of noise, class imbalance, and distribution gaps between different projects. We pre-processed the dataset for experimentation by resolving all these issues. To resolve the issue of noise, we removed duplication from the dataset by removing redundant rows. We then covered the data distribution gap between different sources and target projects using the min-max normalization technique. After covering the data distribution gap, we generated synthetic data using a CTGANsynthesizer to solve class imbalance issues. We solved the class imbalance issue by generating an equal number of instances, as well as an equal number of output classes. After carrying out all of these steps, we obtained normalized data. We applied the hybrid feature selection technique on the pre-processed data to optimize the feature set. We obtained significant results of an average AUC of 75.98%. From the empirical study, it was demonstrated that feature selection and hyper-parameter tuning have a significant impact on defect prediction accuracy in cross-projects.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.