Abstract

In previous studies, when collecting defect data, if the fix of a defect spans multiple modules, each involved module is labeled as defective. In this context, the defect prediction models are built based on the features of each individual module, ignoring the potential associations between the modules involved in the same defect(referred to as “intra-defect associations”). Considering the possibility of numerous cross-module defects in practice, we hypothesize that these intra-defect associations could play a crucial role in enhancing defect prediction performance. Unfortunately, there is no empirical evidence to know that. To this end, we are motivated to conduct a comprehensive study to explore the implications of intra-defect associations for defect prediction. We first examine the proportion of cross-module defects and the relationships between the involved modules. The results reveal that, at function level, the majority of defects occur across functions, with most of the cross-module defects exhibiting implicit dependencies. Inspired by these findings, we propose a novel data processing approach for building defect prediction models. This approach leverages the intra-defect associations by merging the involved modules into new instances with mean or median variables to augment the training data. The experimental results indicate that considering intra-defect associations can significantly improve the defect prediction performance in both the ranking and classification scenarios. This study provides valuable insights into the implications of intra-defect associations for defect prediction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call