BCL-FL: A Data Augmentation Approach with Between-Class Learning for Fault Localization

Yan Lei,Zhou Xu,Meng Yan,Sheng Huang,Chunyan Liu,Huan Xie

doi:10.1109/saner53432.2022.00045

Abstract

Automated fault localization (FL) techniques collect runtime information as input data and then analyze input data to identify the relationship between program statements and failures. They usually take advantages of the statistics of the input data to develop a suspiciousness evaluation methodology (e.g., spectrum-based formulas and deep neural network models) by exploring the underlying correlation rooted in the input data. Thus, the quality of input data is critical for FL. In the actual process of development, developers seek to generate adequate test cases for testing the function or the robustness of a subject program. However, regarding a fault, most test cases are passed test cases and a very few ones are failed test cases since a very small portion of inputs in input domain will lead to a program failure. It means that FL usually faces a problem of imbalanced data, and this problem has been proven to pose an adverse effect on FL effectiveness. To address this problem, we propose BCL-FL: a data augmentation approach based on between-class learning, which produces new synthesized failed test samples by mixing two classes of real test cases (i.e., a passed test case and a failed one) with a random ratio. Specifically, BCL-FL uses the characteristics of real failed test cases to design a data synthesis formula suitable for failed test samples, which can make the synthesized failed test samples closer to real test cases. Since the synthesized data is different from real data, we ingeniously assign a continuous value between 0 and 1 to label the synthesized sample according to the mixing ratio of original labels. We take the synthesized failed test samples and the original test cases as the balanced input data for FL techniques to address the imbalanced data problem. To evaluate the effectiveness of BCL-FL, we conduct large-scale experiments on 287 faulty versions of eight large-sized programs (from ManyBugs and Defects4J) using six state-of-the-art FL approaches. The experimental results show that BCL-FL significantly improves the effectiveness of existing FL techniques, e.g., BCL-FL improves the CNN-FL approach in Top-1, Top-5, and Top-10 by 150%, 136.36%, and 193.1%, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BCL-FL: A Data Augmentation Approach with Between-Class Learning for Fault Localization

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A deep semantics-aware data augmentation method for fault localization
Jian Hu ... Yan Lei
Information and Software Technology | VOL. 168
Jian Hu, et. al.Jian Hu ... Yan Lei
24 Jan 2024
Information and Software Technology | VOL. 168

Evaluation and Analysis of Spectrum-Based Fault Localization with Modified Similarity Coefficients for Software Debugging
Yi-Sian You ... Chin-Yu Huang
-
Yi-Sian You, et. al.Yi-Sian You ... Chin-Yu Huang
01 Jul 2013
01 Jul 2013

FDG: a precise measurement of fault diagnosability gain of test cases
Gabin An ... Shin Yoo
-
Gabin An, et. al.Gabin An ... Shin Yoo
18 Jul 2022
18 Jul 2022

VFL: Variable-based fault localization
Jeongho Kim ... Jindae Kim
Information and Software Technology | VOL. 107
Jeongho Kim, et. al.Jeongho Kim ... Jindae Kim
30 Nov 2018
Information and Software Technology | VOL. 107

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BCL-FL: A Data Augmentation Approach with Between-Class Learning for Fault Localization

Abstract

Talk to us

Similar Papers