NeuralCCD: Integrating Multiple Features for Neural Coincidental Correctness Detection
Fault localization seeks to locate the suspicious statements possible for causing a program failure. Experimental evidence shows that fault localization effectiveness is affected adversely by the existence of coincidental correctness (CC) test cases, where a CC test case denotes the test case which executes a fault but no failure occurs. Even worse, CC test cases are prevailing in realistic testing and debugging, leading to a severe issue on fault localization effectiveness. Thus, it is indispensable to accurately detect CC test cases and alleviate their harmful effect on fault localization effectiveness.To address this problem, we propose NeuralCCD: a neural coincidental correctness detection approach by integrating multiple features. Specifically, NeuralCCD first leverages suspiciousness score, coverage ratio and similarity to define three CC detection features. Based on these CC detection features and CC labels, NeuralCCD utilizes multi-layer perceptron to learn a different feature-based model for a program, and finally combine the trained models of different programs as an ensemble system to detect CC test cases. To evaluate the effectiveness of NeuralCCD, we conduct large-scale experiments on 247 faulty version of five representative benchmarks and compare NeuralCCD with four state-of-the-art CC detection approaches. The experimental results show that NeuralCCD significantly improves the effectiveness of CC detection, e.g., NeuralCCD yields by at most 109.5%, 93% and 81.3% improvement of Top-1, Top-3 and Top-5 over Tech-I when utilized in Dstar formular.
- Research Article
- 10.1109/tr.2026.3668421
- Jan 1, 2026
- IEEE Transactions on Reliability
Coincidental Correctness (CC) arises when a test case executes faulty entity in a program without causing a failure. This phenomenon injects noise into coverage information, as CC tests weaken the connection between faulty entities and test failures. Since many fault localization (FL) approaches relies on analyzing test execution traces to locate faulty entities, the compromised reliability of test results directly undermines FL accuracy. Furthermore, the detrimental effects of CC extend beyond fault localization to subsequent software maintenance tasks like automatic program repair. Therefore, identifying and mitigating CC tests becomes critical not only for enhancing FL but also for ensuring robust software quality assurance. Thus, we propose FusionCC: an approach that applies multiscale coverage features and handcrafted features to fuse complementary feature representations for CC test case detection. Specifically, FusionCC first refines original coverage data by filtering out noisy irrelevant elements, then extracts multiscale features from the refined matrix, and finally fuses the coverage and handcrafted features to generate highly informative feature representations for CC detection. FusionCC realizes a comprehensive fusion of complementary features across different scales and from diverse sources, which significantly enhances the accuracy of CC detection. To evaluate the effectiveness of FusionCC, we conduct large-scale experiments on 277 faulty versions of six representative benchmarks. The experimental results show that FusionCC significantly improves CC detection (e.g., average improvements of 50.93% precision and 82.03% in <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$F_{1}$</tex-math></inline-formula> value compared to state-of-the-art CC detection approaches) and fault localization effectiveness (e.g., 10.33, 19.33, 25.67 average faults can be found in terms of Top-1, Top-3, Top-5 metrics at relabel strategy compared with state-of-the-art FL approaches).
- Conference Article
3
- 10.1109/issre59848.2023.00074
- Oct 9, 2023
A test suite is indispensable for fault localization by providing useful execution information of its test cases for locating suspicious statements of being faulty. There exists a type of test cases known as coincidental correctness (CC) test cases, which executes the faulty statement whereas produces the anticipated output. The existing studies have shown CC test cases harmfully impact fault localization effectiveness. Therefore, it is crucial to detect CC test cases to mitigate the adverse impact of CC test cases on fault localization.To address this issue, we propose ContraCC: a CC test cases detection method using contrastive learning. The insight of ContraCC is that the internal structural information of source test case execution data should be beneficial for CC detection whereas there is a lack of suitable representation methods. Inspired by the insight, ContraCC uses contrastive learning to learn new differentiated representations as test case vectors, which differentiate between similar and dissimilar pairs of test cases by maximizing their similarity within the same class and minimizing it between different classes. Based on the contrastive learning representations (i.e., test case vectors), ContraCC adopts multi-layer perceptron for binary classification to detect CC in downstream tasks. To evaluate the effectiveness of ContraCC, we conduct large-scale experiments on widely-used benchmarks by comparing ContraCC with five state-of-the-art CC test cases detection methods and applying ContraCC for fault localization. The experimental results show that ContraCC outperforms four state-of-the-art methods (e.g., from 10% to 84% improvement in Top-N on the best-performing baseline NeuralCCD) and significantly improves fault localization effectiveness (e.g., 24% improvement on the best-performing baseline Dstar).
- Research Article
4
- 10.1109/tse.2024.3481893
- Dec 1, 2024
- IEEE Transactions on Software Engineering
Coincidental correctness (CC) is a situation during the execution of a test case, the buggy entity is executed, but the program behaves correctly as expected. Many automated fault localization (FL) techniques use runtime information to discover the underlying connection between the executed buggy entity and the failing test result. The existence of CC will weaken such connection, mislead the FL algorithms to build inaccurate models, and consequently, decrease the localization accuracy. To alleviate the adverse effect of CC on FL, CC detection techniques have been proposed to identify the possible CC tests via heuristic or machine learning algorithms. However, their performance on precision is not satisfactory since they overestimate the possible CC tests and are insufficient in learning the deep semantic features. In this work, we propose a novel <u>Tri</u>plet network-based <u>Co</u>incidental <u>Co</u>rrectness detection technique (<i>i.e.,</i> <b>TriCoCo</b>) to overcome the limitations of the prior works. <b>TriCoCo</b> narrows the possible CC tests by designing three features to identify genuine passing tests. Instead of using all tests as inputs by existing techniques, <b>TriCoCo</b> takes the identified genuine passing tests and failing ones to train a triplet model that can evaluate their relative distance. Finally, <b>TriCoCo</b> infers the probability of being a CC test of the test in the rest of the passing tests by using the trained triplet model. We conduct large-scale experiments to evaluate <b>TriCoCo</b> based on the widely-used Defects4J benchmark. The results demonstrate that <b>TriCoCo</b> can improve not only the precision of CC detection but also the effectiveness of FL techniques, <i>e.g.,</i> the precision of <b>TriCoCo</b> is 80.33<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> on average, and <b>TriCoCo</b> boosts the efficacy of DStar by 18<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula>–74<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> in terms of MFR metric when compared to seven state-of-the-art CC detection baselines.
- Research Article
8
- 10.1504/ijapr.2018.092520
- Jan 1, 2018
- International Journal of Applied Pattern Recognition
Although empirical studies have confirmed the effectiveness of spectrum-based\nfault localization (SBFL) techniques, their performance may be degraded due to\npresence of some undesired circumstances such as the existence of coincidental\ncorrectness (CC) where one or more passing test cases exercise a faulty\nstatement and thus causing some confusion to decide whether the underlying\nexercised statement is faulty or not. This article aims at improving SBFL\neffectiveness by mitigating the effect of CC test cases. In this regard, a new\nmethod is proposed that uses a support vector machine (SVM) with a customized\nkernel function. To build the kernel function, we applied a new\nsequence-matching algorithm that measures the similarities between passing and\nfailing executions. We conducted some experiments to assess the proposed\nmethod. The results show that our method can effectively improve the\nperformance of SBFL techniques.\n
- Conference Article
25
- 10.1109/icst.2012.130
- Apr 1, 2012
Coincidentally correct test cases are those that execute faulty statements but do not cause failures. Such test cases reduce the effectiveness of spectrum-based fault localization techniques, such as Ochiai. These techniques calculate a suspiciousness score for each statement. The suspiciousness score estimates the likelihood that the program will fail if the statement is executed. The presence of coincidentally correct test cases reduces the suspiciousness score of the faulty statement, thereby reducing the effectiveness of fault localization. We present two approaches that predict coincidentally correct test cases and use the predictions to improve the effectiveness of spectrum based fault localization. In the first approach, we assign weights to passing test cases such that the test cases that are likely to be coincidentally correct obtain low weights. Then we use the weights to calculate suspiciousness scores. In the second approach, we iteratively predict and remove coincidentally correct test cases, and calculate the suspiciousness scores with the reduced test suite. In this dissertation, we investigate the cost and effectiveness of our approach to predicting coincidentally correct test cases and utilizing the predictions. We report the results of our preliminary evaluation of effectiveness and outline our research plan.
- Conference Article
14
- 10.1109/tase.2014.16
- Sep 1, 2014
Spectrum-based fault localization techniques leverage coverage information to identify the faulty elements of the program via passed and failed runs. However, the effectiveness of these techniques can be affected adversely by coincidental correctness, which occurs when faulty elements are executed, but the program produces the correct output. This paper proposes a clustering-based strategy to improve the effectiveness of spectrum-based fault localization. The basis of this strategy is that test cases in the same cluster have similar behaviors. Our experimental results show that, the percentage of clusters that contain coincidentally correct test cases in clusters which do not contain failed test cases, is usually smaller than the percentage of coincidentally correct test cases in passed test cases. By clustering test cases and reconstructing the coverage matrix, our extensive experiments demonstrated that the fault-localization accuracy of Spectrum-based fault localization techniques can be effectively improved.
- Conference Article
79
- 10.1109/icst.2010.22
- Jan 1, 2010
Researchers have argued that for failure to be observed the following three conditions must be met: 1) the defect is executed, 2) the program has transitioned into an infectious state, and 3) the infection has propagated to the output. Coincidental correctness arises when the program produces the correct output, while conditions 1) and 2) are met but not 3). In previous work, we showed that coincidental correctness is prevalent and demonstrated that it is a safety reducing factor for coverage-based fault localization. This work aims at cleansing test suites from coincidental correctness to enhance fault localization. Specifically, given a test suite in which each test has been classified as failing or passing, we present three variations of a technique that identify the subset of passing tests that are likely to be coincidentally correct. We evaluated the effectiveness of our techniques by empirically quantifying the following: 1) how accurately did they identify the coincidentally correct tests, 2) how much did they improve the effectiveness of coverage-based fault localization, and 3) how much did coverage decrease as a result of applying them. Using our better performing technique and configuration, the safety and precision of fault-localization was improved for 88% and 61% of the programs, respectively.
- Research Article
98
- 10.1145/2559932
- Feb 1, 2014
- ACM Transactions on Software Engineering and Methodology
Researchers have argued that for failure to be observed the following three conditions must be met: C R = the defect was reached; C I = the program has transitioned into an infectious state; and C P = the infection has propagated to the output. Coincidental Correctness (CC) arises when the program produces the correct output while condition C R is met but not C P . We recognize two forms of coincidental correctness, weak and strong. In weak CC , C R is met, whereas C I might or might not be met, whereas in strong CC , both C R and C I are met. In this work we first show that CC is prevalent in both of its forms and demonstrate that it is a safety reducing factor for Coverage-Based Fault Localization (CBFL). We then propose two techniques for cleansing test suites from coincidental correctness to enhance CBFL, given that the test cases have already been classified as failing or passing. We evaluated the effectiveness of our techniques by empirically quantifying their accuracy in identifying weak CC tests. The results were promising, for example, the better performing technique, using 105 test suites and statement coverage, exhibited 9% false negatives, 30% false positives, and no false negatives nor false positives in 14.3% of the test suites. Also using 73 test suites and more complex coverage, the numbers were 12%, 19%, and 15%, respectively.
- Research Article
2
- 10.3390/sym14061267
- Jun 19, 2022
- Symmetry
An important research aspect of Spectrum-Based Fault Localization (SBFL) is the influence factors of the effectiveness of suspiciousness formulas from the perspective of symmetry. Coincidental correctness is one of the most important factors impacting the effectiveness of suspiciousness formulas. The influence of fault localization by coincidental correctness has attracted a large amount of research in the perspective of empirical study; however, it can hardly be considered as sufficiently comprehensive when there are a large number of the symmetrical suspiciousness formulas. Therefore, we first develop an innovative theoretical framework with function derivation investigating suspiciousness formulas impacted by coincidental correctness. We define three types of relations between formulas affected by coincidental correctness: namely, improved type, invariant type and uncertain type. We investigated 30 suspiciousness formulas using this framework and group them into three categories. Furthermore, we conduct an empirical study to verify the effectiveness of SBFL affected by coincidental correctness on four relatively large C programs. We proved that coincidental correctness has a positive effect on 23 out of these 30 formulas, no effect on 5 of them, and the effect on the remaining 2 of them depend on certain conditions. The experimental results show that the effectiveness of some suspiciousness formulas can be enhanced and that of some suspiciousness formulas remain unchanged.
- Conference Article
39
- 10.1109/compsac.2014.32
- Jul 1, 2014
Although empirical studies have demonstrated the usefulness of statistical fault localizations based on code coverage, the effectiveness of these techniques may be deteriorated due to the presence of some undesired circumstances such as the existence of coincidental correctness where one or more passing test cases exercise a faulty statement and thus causing some confusion to decide whether the underlying exercised statement is faulty or not. Fault localizations based on coverage can be improved if all possible instances of coincidental correctness are identified and proper strategies are employed to deal with these troublesome test cases. We introduce a technique to effectively identify coincidentally correct test cases. The proposed technique combines support vector machines and ensemble learning to detect mislabeled test cases, i.e. Coincidentally correct test cases. The ensemble-based support vector machine then can be used to trim a test suite or flip the test status of the coincidental correctness test cases and thus improving the effectiveness of fault localizations.
- Research Article
4
- 10.1504/ijapr.2018.10013768
- Jan 1, 2018
- International Journal of Applied Pattern Recognition
Although empirical studies have confirmed the effectiveness of spectrum-based fault localisation (SBFL) techniques, their performance may be degraded due to presence of some undesired circumstances such as the existence of coincidental correctness (CC) where one or more passing test cases exercise a faulty statement and thus causing some confusion to decide whether the underlying exercised statement is faulty or not. This article aims at improving SBFL effectiveness by mitigating the effect of CC test cases. In this regard, a new method is proposed that uses a support vector machine (SVM) with a customised kernel function. To build the kernel function, we applied a new sequence-matching algorithm that measures the similarities between passing and failing executions. We conducted some experiments to assess the proposed method. The results show that our method can effectively improve the performance of SBFL techniques.
- Research Article
8
- 10.1016/j.jss.2023.111900
- Nov 18, 2023
- Journal of Systems and Software
Trace matrix optimization for fault localization
- Research Article
8
- 10.1007/s00607-018-0610-0
- Mar 22, 2018
- Computing
As spectra-based fault localization techniques report suspicious statements by analyzing the coverage of test cases, the effectiveness of the results is highly dependent on the composition of test suites. This paper proposes an approach for selecting a subset of the passed test suite when a failure revealed by a failed test case. The goal is to obtain a more effective fault localization using a minimal number of test cases than using the originally given large number of test cases. A novelty is that a prioritization criterion and a selection criterion are defined. Different from previous studies, the failed trace is fully considered. The prioritization criterion partitions statements in the failed trace into more suspicious and less suspicious, and then ranks passed test cases by their ability in distinguishing the more suspicious statements from the less suspicious ones. The selection criterion selects the minimal passed test suite which can maximize the number of coverage equivalent classes in the failed trace, so as to distinguish the suspicious statements and meanwhile reduce the size of the test suite. Another novelty is that our approach turns the test case selection into a multi-criteria optimization to make the prioritization and the selection criteria complement each other. This approach was evaluated with 5 fault localization techniques, 8 subject programs and 35,392 test cases. The results show that the fault localization effectiveness can be significantly improved with less than 5% passed test cases. Our approach has advantages over the statement- based and vector-based test suite reduction approaches in both fault localization effectiveness and test suite reduction rate.
- Research Article
58
- 10.1016/j.infsof.2012.01.006
- Jan 31, 2012
- Information and Software Technology
How well does test case prioritization integrate with statistical fault localization?
- Research Article
19
- 10.1142/s0218194013500186
- Jun 1, 2013
- International Journal of Software Engineering and Knowledge Engineering
Coverage-based fault localization techniques leverage the coverage information to identify the faulty elements of a program. However, these techniques can be adversely affected by coincidental correctness, which occurs when the defect is executed but no failure is revealed. In this paper, we propose a clustering-based strategy to identify coincidental correctness in fault localization. The insight behind this strategy is that tests in the same cluster have similar behaviors. Thus a passed test in a cluster with many failed tests is highly possible to be coincidentally correct because it has the potential to execute the faulty elements as those failed ones do. We evaluated this technique from two aspects: the ability to identify coincidental correctness and the effectiveness to improve fault localization. The experimental results show that our strategy can alleviate the coincidental correctness problem and improve the effectiveness of fault localization.