A deep semantics-aware data augmentation method for fault localization

Jian Hu,Yan Lei

doi:10.1016/j.infsof.2024.107409

Abstract

Context:Fault localization (FL) techniques are employed to identify the relationship between program statements and failures by analyzing runtime information. They rely on the statistics of input data to explore the underlying correlation rooted in it. Consequently, the quality of input data is of utmost importance for FL. However, in practice, passing tests significantly outnumber failing tests regarding a fault. This leads to a class imbalance challenge that can adversely affect the effectiveness of FL. Objective:To tackle the issue of imbalanced data in fault localization, we propose PRAM: a deeP semantic-awaRe dAta augmentation Method to improve the effectiveness of FL methods. Method:PRAM utilizes program dependencies to enhance the semantic context, thus showing how a failure is caused. Then, PRAM employs mixup method to synthesize new failing test samples by merging two real failing test cases with a random ratio to balance the input data. Finally, PRAM feeds the balanced data consisting of synthesized failing test cases and original test cases to FL techniques. To evaluate the effectiveness of PRAM, we conducted large-scale experiments on 330 versions of nine large-sized real programs for six state-of-the-art FL methods, two data optimization methods and two data augmentation methods. Results:Our experimental results show that PRAM outperforms in most cases for Top-K metrics and reduces the number of checked statements from 40.38% to 80.04% compared with the original FL methods. Furthermore, PRAM reduces the checked statements from 16.92% to 56.98% for data optimization methods and from 12.48% to 26.82% for data augmentation methods. Conclusion:The experimental results show that PRAM is not only more effective than the original FL methods but also more effective than two representative data optimization methods and two data augmentation methods, which indicates that PRAM is a universal effective data augmentation method for various FL methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A deep semantics-aware data augmentation method for fault localization

Abstract

Talk to us

Similar Papers

More From: Information and Software Technology

Lead the way for us

Similar Papers

BCL-FL: A Data Augmentation Approach with Between-Class Learning for Fault Localization
Yan Lei ... Huan Xie
-
Yan Lei, et. al.Yan Lei ... Huan Xie
01 Mar 2022
01 Mar 2022

Fault localization for automated program repair: effectiveness, performance, repair correctness
Fatmah Yousef Assiri ... James M Bieman
Software Quality Journal | VOL. 25
Fatmah Yousef Assiri, et. al.Fatmah Yousef Assiri ... James M Bieman
26 Mar 2016
Software Quality Journal | VOL. 25

Trace matrix optimization for fault localization
Jian Hu
The Journal of Systems & Software | VOL. 208
Jian HuJian Hu
18 Nov 2023
The Journal of Systems & Software | VOL. 208

FDG: a precise measurement of fault diagnosability gain of test cases
Gabin An ... Shin Yoo
-
Gabin An, et. al.Gabin An ... Shin Yoo
18 Jul 2022
18 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A deep semantics-aware data augmentation method for fault localization

Abstract

Talk to us

Similar Papers

More From: Information and Software Technology