Improving deep‐learning‐based fault localization with resampling

Zhuo Zhang,Junhao Wen,Xiaoguang Mao,Meng Yan,Ling Xu,Yan Lei

doi:10.1002/smr.2312

Abstract

AbstractMany fault localization approaches recently utilize deep learning to learn an effective localization model showing a fresh perspective with promising results. However, localization models are generally learned from class imbalance datasets; that is, the number of failing test cases is much fewer than passing test cases. It may be highly susceptible to affect the accuracy of learned localization models. Thus, in this paper, we explore using data resampling to reduce the negative effect of the imbalanced class problem and improve the accuracy of learned models of deep‐learning‐based fault localization. Specifically, for deep‐learning‐based fault localization, its learning feature may require duplicate essential data to enhance the weak but beneficial experience incurred by the class imbalance datasets. We leverage the property of test cases (i.e., passing or failing) to identify failing test cases as the duplicate essential data and propose an iterative oversampling approach to resample failing test cases for producing a class balanced test suite. We apply the test case resampling to representative localization models using deep learning. Our empirical results on eight large‐sized programs with real faults and four large‐sized programs with seeded faults show that the test case resampling significantly improves fault localization effectiveness.

Full Text