Articles published on Coincidental Correctness
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
21 Search results
Sort by Recency
- Research Article
- 10.1016/j.infsof.2025.107978
- Mar 1, 2026
- Information and Software Technology
- Jian Hu
Representation learning for coincidental correctness in fault localization
- Research Article
- 10.1109/tr.2026.3668421
- Jan 1, 2026
- IEEE Transactions on Reliability
- Tao Zhang + 5 more
Coincidental Correctness (CC) arises when a test case executes faulty entity in a program without causing a failure. This phenomenon injects noise into coverage information, as CC tests weaken the connection between faulty entities and test failures. Since many fault localization (FL) approaches relies on analyzing test execution traces to locate faulty entities, the compromised reliability of test results directly undermines FL accuracy. Furthermore, the detrimental effects of CC extend beyond fault localization to subsequent software maintenance tasks like automatic program repair. Therefore, identifying and mitigating CC tests becomes critical not only for enhancing FL but also for ensuring robust software quality assurance. Thus, we propose FusionCC: an approach that applies multiscale coverage features and handcrafted features to fuse complementary feature representations for CC test case detection. Specifically, FusionCC first refines original coverage data by filtering out noisy irrelevant elements, then extracts multiscale features from the refined matrix, and finally fuses the coverage and handcrafted features to generate highly informative feature representations for CC detection. FusionCC realizes a comprehensive fusion of complementary features across different scales and from diverse sources, which significantly enhances the accuracy of CC detection. To evaluate the effectiveness of FusionCC, we conduct large-scale experiments on 277 faulty versions of six representative benchmarks. The experimental results show that FusionCC significantly improves CC detection (e.g., average improvements of 50.93% precision and 82.03% in <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$F_{1}$</tex-math></inline-formula> value compared to state-of-the-art CC detection approaches) and fault localization effectiveness (e.g., 10.33, 19.33, 25.67 average faults can be found in terms of Top-1, Top-3, Top-5 metrics at relabel strategy compared with state-of-the-art FL approaches).
- Research Article
4
- 10.1109/tse.2024.3481893
- Dec 1, 2024
- IEEE Transactions on Software Engineering
- Huan Xie + 6 more
Coincidental correctness (CC) is a situation during the execution of a test case, the buggy entity is executed, but the program behaves correctly as expected. Many automated fault localization (FL) techniques use runtime information to discover the underlying connection between the executed buggy entity and the failing test result. The existence of CC will weaken such connection, mislead the FL algorithms to build inaccurate models, and consequently, decrease the localization accuracy. To alleviate the adverse effect of CC on FL, CC detection techniques have been proposed to identify the possible CC tests via heuristic or machine learning algorithms. However, their performance on precision is not satisfactory since they overestimate the possible CC tests and are insufficient in learning the deep semantic features. In this work, we propose a novel <u>Tri</u>plet network-based <u>Co</u>incidental <u>Co</u>rrectness detection technique (<i>i.e.,</i> <b>TriCoCo</b>) to overcome the limitations of the prior works. <b>TriCoCo</b> narrows the possible CC tests by designing three features to identify genuine passing tests. Instead of using all tests as inputs by existing techniques, <b>TriCoCo</b> takes the identified genuine passing tests and failing ones to train a triplet model that can evaluate their relative distance. Finally, <b>TriCoCo</b> infers the probability of being a CC test of the test in the rest of the passing tests by using the trained triplet model. We conduct large-scale experiments to evaluate <b>TriCoCo</b> based on the widely-used Defects4J benchmark. The results demonstrate that <b>TriCoCo</b> can improve not only the precision of CC detection but also the effectiveness of FL techniques, <i>e.g.,</i> the precision of <b>TriCoCo</b> is 80.33<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> on average, and <b>TriCoCo</b> boosts the efficacy of DStar by 18<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula>–74<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> in terms of MFR metric when compared to seven state-of-the-art CC detection baselines.
- Research Article
8
- 10.1016/j.jss.2023.111900
- Nov 18, 2023
- Journal of Systems and Software
- Jian Hu
Trace matrix optimization for fault localization
- Research Article
5
- 10.1007/s00530-022-01039-w
- Dec 24, 2022
- Multimedia Systems
- Heling Cao + 5 more
A coincidental correctness test case identification framework with fuzzy C-means clustering
- Research Article
2
- 10.3390/sym14061267
- Jun 19, 2022
- Symmetry
- Heling Cao + 2 more
An important research aspect of Spectrum-Based Fault Localization (SBFL) is the influence factors of the effectiveness of suspiciousness formulas from the perspective of symmetry. Coincidental correctness is one of the most important factors impacting the effectiveness of suspiciousness formulas. The influence of fault localization by coincidental correctness has attracted a large amount of research in the perspective of empirical study; however, it can hardly be considered as sufficiently comprehensive when there are a large number of the symmetrical suspiciousness formulas. Therefore, we first develop an innovative theoretical framework with function derivation investigating suspiciousness formulas impacted by coincidental correctness. We define three types of relations between formulas affected by coincidental correctness: namely, improved type, invariant type and uncertain type. We investigated 30 suspiciousness formulas using this framework and group them into three categories. Furthermore, we conduct an empirical study to verify the effectiveness of SBFL affected by coincidental correctness on four relatively large C programs. We proved that coincidental correctness has a positive effect on 23 out of these 30 formulas, no effect on 5 of them, and the effect on the remaining 2 of them depend on certain conditions. The experimental results show that the effectiveness of some suspiciousness formulas can be enhanced and that of some suspiciousness formulas remain unchanged.
- Research Article
14
- 10.1002/stvr.1762
- Jan 9, 2021
- Software Testing, Verification and Reliability
- Rawad Abou Assi + 2 more
Abstract According to the reachability–infection–propagation (RIP) model, three conditions must be satisfied for program failure to occur: (1) the defect's location must bereached, (2) the program's state must becomeinfectedand (3) the infection mustpropagateto the output.Weak coincidental correctness(orweak CC) occurs when the program produces the correct output, while condition (1) is satisfied but conditions (2) and (3) are not satisfied.Strong coincidental correctness(orstrong CC) occurs when the output is correct, while both conditions (1) and (2) are satisfied but not (3). The prevalence ofCCwas previously recognized. In addition, the potential for its negative effect on spectrum‐based fault localization (SBFL) was analytically demonstrated; however, this was not empirically validated. UsingDefects4J, this paper empirically studies the impact ofweakandstrong CCon three well‐researched coverage‐based fault detection and localization techniques, namely, test suite reduction (TSR), test case prioritization (TCP) and SBFL. Our study, which involved 52 SBFL metrics, provides the following empirical evidence. (i) The negative impact ofCCtests on TSR and TCP is very significant. In addition, cleansing theCCtests was observed to yield (a) a 100% TSR defect detection rate for all subject programs and (b) an improvement of TCP for over 92% of the subjects. (ii) The impact ofCCtests on SBFL varies widely w.r.t. the metric used. The negative impact was strong for 11 metrics, mild for 37, non‐measurable for 1 and non‐existent for 3 metrics. Interestingly, the negative impact was mild for the 9 most popular and/or most effective SBFL metrics. In addition, cleansing theCCtests resulted in the deterioration of SBFL for a considerable number of subject programs. (iii) Increasing the proportion ofCCtests has a limited impact on TSR, TCP and SBFL. Interestingly, for TSR and TCP and 11 SBFL metrics, small and large proportions ofCCtests are strongly harmful. (iv) Lastly,weakandstrong CCare equally detrimental in the context of TSR, TCP and SBFL.
- Research Article
18
- 10.1007/s10664-020-09859-y
- Aug 17, 2020
- Empirical Software Engineering
- Farid Feyzi
In this article we emphasize that most of the faults, appearing in real-world programs, are complicated and there exists a high interaction between faulty and other correlated statements, that is likely to cause coincidental correctness in many cases. To effectively diminish the negative impact of coincidentally correct tests on localization effectiveness, we suggest analyzing the combinatorial effect of program statements on the failure. To this end, we develop a new framework, CGT-FL, for evaluation and ranking program statements in a manner that statements which have strong discriminatory power as a group but are weak as individuals could be identified. The framework firstly evaluates the interactivity degree of each statement according to its influence on the intricate interrelation among statements by a Shapley value-based cooperative game-theoretic method. Then, statements are selected in a forward way by considering both interactivity and relevance measures. To verify the effectiveness of CGT-FL, we provide the results of our extensive experiments with different subject programs, containing seeded and real faults. The experimental results are then compared with those provided by different fault localization techniques for both single-fault and multiple-fault programs. The results prove the outperformance of CGT-FL compared to state-of-the-art techniques.
- Research Article
19
- 10.1016/j.jss.2020.110635
- May 12, 2020
- Journal of Systems and Software
- Arash Sabbaghi + 2 more
FCCI: A fuzzy expert system for identifying coincidental correct test cases
- Research Article
1
- 10.1002/stvr.1698
- Mar 21, 2019
- Software Testing, Verification and Reliability
- Jeff Offutt
I love journal papers and you should too
- Research Article
33
- 10.1007/s00607-018-0591-z
- Jan 29, 2018
- Computing
- Farid Feyzi + 1 more
Despite the proven applicability of the spectrum-based fault localization (SBFL) methods, their effectiveness may be degraded due to the presence of coincidental correctness, which occurs when faults fail to propagate, i.e., their execution does not result in failures. This article aims at improving SBFL effectiveness by mitigating the effect of coincidentally correct test cases. In this regard, given a test suite in which each test has been classified as failing or passing and each faulty program has a single-bug, we present a program slicing-based technique to identify a set of program entities that directly affect the program output when executed with failing test cases, called failure candidate causes (FCC). We then use FCC set to identify test cases that can be marked as being coincidentally correct. These tests are identified based on two heuristics: the average suspiciousness score of the statements that directly affect the program output and the coverage ratio of those statements. To evaluate our approach, we used several evaluation metrics and conducted extensive experiments on programs containing single and multiple bugs, including both real and seeded faults. The empirical results demonstrate that the proposed heuristics can alleviate the coincidental correctness problem and improve the accuracy of SBFL techniques.
- Research Article
4
- 10.1504/ijapr.2018.10013768
- Jan 1, 2018
- International Journal of Applied Pattern Recognition
- Farid Feyzi + 1 more
Although empirical studies have confirmed the effectiveness of spectrum-based fault localisation (SBFL) techniques, their performance may be degraded due to presence of some undesired circumstances such as the existence of coincidental correctness (CC) where one or more passing test cases exercise a faulty statement and thus causing some confusion to decide whether the underlying exercised statement is faulty or not. This article aims at improving SBFL effectiveness by mitigating the effect of CC test cases. In this regard, a new method is proposed that uses a support vector machine (SVM) with a customised kernel function. To build the kernel function, we applied a new sequence-matching algorithm that measures the similarities between passing and failing executions. We conducted some experiments to assess the proposed method. The results show that our method can effectively improve the performance of SBFL techniques.
- Research Article
8
- 10.1504/ijapr.2018.092520
- Jan 1, 2018
- International Journal of Applied Pattern Recognition
- Farid Feyzi + 1 more
Although empirical studies have confirmed the effectiveness of spectrum-based\nfault localization (SBFL) techniques, their performance may be degraded due to\npresence of some undesired circumstances such as the existence of coincidental\ncorrectness (CC) where one or more passing test cases exercise a faulty\nstatement and thus causing some confusion to decide whether the underlying\nexercised statement is faulty or not. This article aims at improving SBFL\neffectiveness by mitigating the effect of CC test cases. In this regard, a new\nmethod is proposed that uses a support vector machine (SVM) with a customized\nkernel function. To build the kernel function, we applied a new\nsequence-matching algorithm that measures the similarities between passing and\nfailing executions. We conducted some experiments to assess the proposed\nmethod. The results show that our method can effectively improve the\nperformance of SBFL techniques.\n
- Research Article
2
- 10.1142/s0218194017500152
- Apr 1, 2017
- International Journal of Software Engineering and Knowledge Engineering
- Zhang Hui
Among the fault localization methods of the coverage-based regression test cases, the ranking of suspiciousness degree is seriously impacted due to the existence of similar test cases, the existence of isolated treatment of each program entity and the coincidental correctness test cases as well as the poor quality of regression test cases, which reduces the efficiency of fault localization. This paper proposes a method that combines the control dependence and data dependence, test cases covering table and test result with vector angular cosine weight to solve the aforesaid problems. Experimental results show that the efficiency of the proposed fault localization of regression test cases is better than that of Tarantula, Jaccard, Ochiai, PPDG and CP methods.
- Research Article
7
- 10.3923/jse.2014.328.344
- Sep 15, 2014
- Journal of Software Engineering
- Yihan Li + 1 more
Identifying Coincidental Correctness in Fault Localization via Cluster Analysis
- Research Article
98
- 10.1145/2559932
- Feb 1, 2014
- ACM Transactions on Software Engineering and Methodology
- Wes Masri + 1 more
Researchers have argued that for failure to be observed the following three conditions must be met: C R = the defect was reached; C I = the program has transitioned into an infectious state; and C P = the infection has propagated to the output. Coincidental Correctness (CC) arises when the program produces the correct output while condition C R is met but not C P . We recognize two forms of coincidental correctness, weak and strong. In weak CC , C R is met, whereas C I might or might not be met, whereas in strong CC , both C R and C I are met. In this work we first show that CC is prevalent in both of its forms and demonstrate that it is a safety reducing factor for Coverage-Based Fault Localization (CBFL). We then propose two techniques for cleansing test suites from coincidental correctness to enhance CBFL, given that the test cases have already been classified as failing or passing. We evaluated the effectiveness of our techniques by empirically quantifying their accuracy in identifying weak CC tests. The results were promising, for example, the better performing technique, using 105 test suites and statement coverage, exhibited 9% false negatives, 30% false positives, and no false negatives nor false positives in 14.3% of the test suites. Also using 73 test suites and more complex coverage, the numbers were 12%, 19%, and 15%, respectively.
- Research Article
19
- 10.1142/s0218194013500186
- Jun 1, 2013
- International Journal of Software Engineering and Knowledge Engineering
- Yi Miao + 4 more
Coverage-based fault localization techniques leverage the coverage information to identify the faulty elements of a program. However, these techniques can be adversely affected by coincidental correctness, which occurs when the defect is executed but no failure is revealed. In this paper, we propose a clustering-based strategy to identify coincidental correctness in fault localization. The insight behind this strategy is that tests in the same cluster have similar behaviors. Thus a passed test in a cluster with many failed tests is highly possible to be coincidentally correct because it has the potential to execute the faulty elements as those failed ones do. We evaluated this technique from two aspects: the ability to identify coincidental correctness and the effectiveness to improve fault localization. The experimental results show that our strategy can alleviate the coincidental correctness problem and improve the effectiveness of fault localization.
- Research Article
35
- 10.1109/mc.2012.185
- Jun 1, 2012
- Computer
- Zhenyu Zhang + 2 more
Fault localization commonly relies on both passed and failed runs, but passed runs are generally susceptible to coincidental correctness and modern software automatically produces a huge number of bug reports on failed runs. FOnly is an effective new technique that relies only on failed runs to locate faults statistically. © 2012 IEEE.
- Research Article
- 10.1002/spe.2104
- Jan 16, 2012
- Software: Practice and Experience
- T H Tse
Software systems today are large and complex. At the same time, the time to market is extremely short because of competition. As a result, program debugging for real-life systems is very difficult. In general, the debugging process consists of three tasks, namely, fault localization, fault repair, and retesting. In particular, fault localization is generally considered to be the most challenging. It is recognized as time-consuming and tedious if conducted manually. On the other hand, formal methods suffer from scalability problems, and static techniques are imprecise. Automatic statistical fault localization techniques are regarded as the most promising option. They compare passed and failed executions of a faulty program and produce a suspiciousness ranking of program entities (such as statements or predicates). Developers may then follow up with the list sequentially to identify program faults. Unfortunately, although a large number of statistical fault localization techniques are available, they have not reached the maturity to pinpoint accurately and precisely the locations of faults. Also, the recording and replaying of passed and failed executions as well as fault repair without introducing new bugs remain unresolved issues. Furthermore, researchers often make unrealistic assumptions, and software subjects under study do not necessarily reflect the fault characteristics of large industrial applications. There is plenty of room for improvement. The 2nd International Workshop on Program Debugging (IWPD 2011) was a full day workshop held in conjunction with the 35th Annual International Computer Software and Applications Conference (COMPSAC 2011) in Munich, Germany in July 2011. It serves as a platform for researchers and practitioners to exchange ideas, present new advancements, and identify further challenges in program debugging. It brings to light the latest challenges and advances in research and practice in program debugging, with a special emphasis on methodology, technology, and environment. Two keynote speeches were given by internationally renowned researchers — T. Y. Chen of Swinburne University of Technology, Australia and W. K. Chan of City University of Hong Kong, Hong Kong. There were also sessions for paper presentations and panel discussions. We shortlisted three papers from the workshop and invited the authors to submit an extended version to Software: Practice and Experience. Two papers were accepted for this focus section after going through two rounds of rigorous reviews involving two to three anonymous reviewers for each article. Both accepted papers address the important area of statistical fault location. The first paper, entitled ‘In quest of the science in statistical fault localization’ by W. K. Chan and Yan Cai, is an extended version of the keynote speech delivered by the first author in IWPD 2011. A vital element in research is to know the shortcomings of the current state of the art. In this paper, the authors conduct a critical review of existing work on statistical fault localization (including their own), highlight misconceptions and unnecessary assumptions, and provide remedial measures to rectify such malpractices. The authors point out that a lot of current research in statistical fault localization does not consider coincidental correctness, which means that the execution of a faulty statement may not necessarily lead to a program failure, even though this important concept has been known to software testers for decades. Also, existing fault localization techniques compare the similarities and dissimilarities between passed and failed executions to locate faults. These similarity coefficients estimate the probability that a particular program entity causes a failure, but ignore the noise caused by other entities. The authors point out the importance of a noise-reduction mechanism for the similarity coefficients. Another issue is that existing researchers often assume that they are dealing with large samples, where the central limit theorem applies. Empirical studies by the authors show that this assumption is often invalid. It is unrealistic to expect the availability of execution profiles with thousands of test verdicts for the average programs. A developer needs to debug a program even if a small number of failures have been revealed. When the number of samples is small, nonparametric statistical techniques should be applied. The authors conclude the paper by giving an insightful summary of the challenges in statistical fault localization that may benefit researchers in software engineering and related software areas. The second paper is entitled ‘A consensus-based strategy to improve the quality of fault localization’ by Vidroha Debroy and W. Eric Wong. Quite a number of statistical fault localization techniques have been proposed. Each of them claims to be superior to others in one aspect or another using different data sets. There is, however, no single technique that is definitely better than others in all aspects. In this paper, the authors put forward an integrated approach to address the issue. Rather than proposing yet another new technique that captures the more promising features of existing techniques, the authors propose a consensus-based strategy, which combines the rankings of several techniques. Using the Borda method, a consolidated ranking is produced by integrating various statement rankings that result from individual techniques. The scale of the proposed approach can be easily extended or retracted because new fault localization techniques can be added by the inclusion of their rankings, or existing techniques can be excluded by the removal of their rankings. Also, because different techniques operate on the same input data set, the overhead of the consensus is minimal. The overall ranking can be determined in linear time. The effectiveness of the consensus-based approach has been validated using three popular fault localization techniques (Tarantula, Ochiai, and H3) on the Siemens suite of programs as well as the Ant, grep, gzip, make, and space programs. The empirical study shows that the performance of the proposed approach is close to the best results of the techniques under study. Finally, I would like to thank Professor Nigel Horspool and Professor Andy Wellings, Editors of Software: Practice and Experience, for kindly agreeing to publish this focus section.
- Research Article
46
- 10.1145/1151695.1151696
- Jul 1, 2006
- ACM Transactions on Software Engineering and Methodology
- R M Hierons
In partition analysis we divide the input domain to form subdomains on which the system's behaviour should be uniform. Boundary value analysis produces test inputs near each subdomain's boundaries to find failures caused by incorrect implementation of the boundaries. However, boundary value analysis can be adversely affected by coincidental correctness---the system produces the expected output, but for the wrong reason. This article shows how boundary value analysis can be adapted in order to reduce the likelihood of coincidental correctness. The main contribution is to cases of automated test data generation in which we cannot rely on the expertise of a tester.