Trace matrix optimization for fault localization

Jian Hu

doi:10.1016/j.jss.2023.111900

Abstract

Fault localization (FL) techniques gather trace information as input data and analyze it to identify the relationship between program statements and failures. Therefore, the input trace matrix is essential for fault localization. However, the current trace matrix faces two main challenges. Firstly, the occurrences of coincidental correctness (CC), which refer to the execution of faulty statements that lead to correct program output, adversely impact the effectiveness of FL. Secondly, the significant disparity in the number of failing and passing test cases poses a data imbalance problem for fault localization. To overcome these issues, we propose TRAIN: a Two-stage tRace mAtrix optImizatioN method for fault localization. In the first stage of optimization, TRAIN leverages an improved cluster analysis to identify and exclude the CC tests to optimize the trace matrix. Subsequently, in the second stage, TRAIN utilizes data augmentation to enhance the failing test cases to further balance the trace matrix. The optimized trace matrix is then used as input data in the FL pipeline to locate the faulty statements. Through extensive experiments conducted on 330 faulty versions of nine large-sized programs (obtained from Defects4J, ManyBugs, and SIR) using six state-of-the-art FL methods, TRAIN demonstrates remarkable improvements in FL effectiveness.

Full Text