The Importance of Accounting for Execution Failures when Predicting Test Flakiness

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Flaky tests are tests that pass and fail on different executions of the same version of a program under test. They waste valuable developer time by making developers investigate false alerts (flaky test failures). To deal with this issue, many prediction methods have been proposed. However, the utility of these methods remains unclear since they are typically evaluated based on single-release data, ignoring that in many cases tests that fail flakily in one release also correctly fail (indicating the presence of bugs) in some other, meaning that it is possible for subsequent correctly-failing cases to pass unnoticed. In this paper, we show that this situation is prevalent and can raise significant concerns for both researchers and practitioners. In particular, we show that flaky tests, tests that exhibit flaky behaviour at some point in time, have a strong fault-revealing capability, i.e., they reveal more than 1/3 of all encountered regression faults. We also show that 76.2%, of all test executions that reveal faults in the codebase under test are made by tests that are classified as flaky by existing prediction methods. Overall, our findings motivate the need for future research to focus on predicting flaky test executions instead of flaky tests.

Similar Papers
  • Research Article
  • Cite Count Icon 21
  • 10.1017/s0033291707001468
Memory and executive impairment in schizophrenia: comparison with frontal and temporal brain damage.
  • Sep 10, 2007
  • Psychological medicine
  • T J Ornstein + 2 more

Although poor neuropsychological test performance is well documented in schizophrenia, how closely it resembles that seen in patients with brain damage in terms of cognitive failures in daily life and stability over time has been little studied. Thirty patients with chronic schizophrenia, 24 patients with frontal or temporal brain damage and 30 healthy controls were given a battery of memory and executive tests. Carers of the two patient groups also completed questionnaires rating memory and executive failures in daily life. Testing was repeated 6 weeks later. The schizophrenia and the brain-damaged patients were significantly impaired on most, but not all tests. The degree of carer-rated memory or executive failure was similar in the two groups, but the schizophrenia patients were rated as having significantly more executive failures than memory failures, whereas the brain-damaged patients showed the reverse pattern. Both groups of patients showed similar consistency of performance across sessions. Neuropsychological impairment in schizophrenia resembles that seen in patients with brain damage, not only in terms of overall severity, but also in terms of stability and the degree to which poor test performance translates into cognitive failures in daily life.

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.robot.2022.104350
A hybrid skill parameterisation model combining symbolic and subsymbolic elements for introspective robots
  • Dec 25, 2022
  • Robotics and Autonomous Systems
  • Alex Mitrevski + 2 more

A hybrid skill parameterisation model combining symbolic and subsymbolic elements for introspective robots

  • Conference Article
  • Cite Count Icon 16
  • 10.1145/3338906.3338966
Locating vulnerabilities in binaries via memory layout recovering
  • Aug 12, 2019
  • Haijun Wang + 7 more

Locating vulnerabilities is an important task for security auditing, exploit writing, and code hardening. However, it is challenging to locate vulnerabilities in binary code, because most program semantics (e.g., boundaries of an array) is missing after compilation. Without program semantics, it is difficult to determine whether a memory access exceeds its valid boundaries in binary code. In this work, we propose an approach to locate vulnerabilities based on memory layout recovery. First, we collect a set of passed executions and one failed execution. Then, for passed and failed executions, we restore their program semantics by recovering fine-grained memory layouts based on the memory addressing model. With the memory layouts recovered in passed executions as reference, we can locate vulnerabilities in failed execution by memory layout identification and comparison. Our experiments show that the proposed approach is effective to locate vulnerabilities on 24 out of 25 DARPA’s CGC programs (96%), and can effectively classifies 453 program crashes (in 5 Linux programs) into 19 groups based on their root causes.

  • Book Chapter
  • Cite Count Icon 8
  • 10.1007/11951957_28
Requirements-Driven Modeling of the Web Service Execution and Adaptation Lifecycle
  • Jan 1, 2006
  • N C Narendra + 1 more

The increasing popularity of Web services for application integration has resulted in a large body of research on Web service composition. However, the major lacuna so far in Web service composition is the lack of a holistic requirements-driven approach for modeling the entire Web service lifecycle, i.e., composition, joint execution, midstream adaptation in response to failures or changing requirements, and finally re-execution until successful completion. In this paper we present such an approach based on our earlier work on context-driven Web service modeling. In particular, we separate requirements into two parts – functional and extra-functional requirements (FRs and EFRs, respectively). We express FRs as commitments made by individual Web services towards the composite Web service, and EFRs as rules that constrain the behavior of the individual Web services while they execute against their FRs. We also show how midstream adaptation in Web service execution – caused either by changes in user requirements or execution failures – can be managed in our approach. We believe that ours is the first such approach towards a comprehensive modeling of requirements for composite Web service executions, and especially during adaptation.

  • Research Article
  • 10.37276/sjss.v6i1.542
Ineffectiveness of Industrial Relations Court Decision Execution: A Critical Analysis of Procedural Law Vacuum and the Urgency of Establishing a Special Execution Institution
  • Nov 29, 2025
  • SIGn Journal of Social Science
  • Dadan Herdiana + 2 more

The development of labor law in Indonesia aims to achieve social justice by protecting workers’ rights. However, the reality of law enforcement reveals a juridical anomaly in which Industrial Relations Court decisions that have obtained permanent legal force (inkracht van gewijsde) are frequently not executed (non-executable). Consequently, workers’ normative rights, such as the right to severance pay and unpaid wages, remain unfulfilled. This study aims to analyze the root causes of such execution failures and to formulate effective institutional solutions. Utilizing a normative juridical research method with statutory, conceptual, and case approaches, this research examines the dependence of Law Number 2 of 2004 on the Civil Procedure Law—specifically the HIR/RBg—which is passive and formalistic in nature. The results indicate that execution failure is caused by the burden of asset proof being placed entirely on the worker. Furthermore, the absence of the court’s investigative authority to conduct asset tracing, along with judges’ weak application of Conservatory Attachment (Conservatoir Beslag) and Penalty Payments (Dwangsom), exacerbates the situation. Dependence on the archaic HIR/RBg procedures proves incompatible with the characteristics of labor disputes that demand speed. This is worsened by the dynamics of non-standard employment relationships in the gig economy, which are vulnerable to asset stripping. This study concludes that without procedural law reform, Industrial Relations Court decisions remain merely illusory judgments. Therefore, the establishment of a Special Execution Unit within the Industrial Relations Court, with autonomous authority to access integrated asset data, is recommended. Additionally, the issuance of regulations mandating the application of Dwangsom on an ex officio basis in every condemnatory (condemnatoir) decision is necessary to guarantee legal certainty and substantive justice for workers.

  • Research Article
  • Cite Count Icon 13
  • 10.1007/bf01053809
Junk bonds, life insurer insolvency, and stock market reactions: The case of first executive corporation
  • Apr 1, 1994
  • Journal of Financial Services Research
  • Joseph A Fields + 3 more

The First Executive Corporation was the largest failure in the history of the life insurance industry. The company was one of the most aggressive purchasers of junk bonds through the 1980s and was the first of several large failures in the staid life insurance industry. In this article, we examine the effect of First Executive's failure on the value of companies in the life insurance industry. We find that the price of other life insurance companies' stock is negatively affected by the earnings announcement that preceded First Executive's failure. The magnitude of an individual company's reaction to First Executive's loss varies according to the proportion of the company's assets invested in junk bonds, the proportion of the company's assests invested in real estate, and the financial strength of the company as measured by A.M. Best's rating.

  • Research Article
  • Cite Count Icon 5
  • 10.1007/s11761-015-0176-z
Towards a formal study of automatic failure recovery in protocol-based web service composition
  • Feb 22, 2015
  • Service Oriented Computing and Applications
  • Nardjes Menadjelia

In the protocol-based Web service composition, the runtime unavailability of component services may result in a failed execution of the composite. In literature, multiple recovery heuristics have been proposed. This work provides a formal study and focuses on the complexity issues of the recovery problem in the protocol-based Web service composition. A recovery is a process responsible of migrating the failed execution into an alternative execution of the composite that still has the ability to reach a final state. The alternative execution is called a recovery execution. Following failure occurrence, several recovery executions may be available. The problem of finding the best recovery execution(s) is called the recovery problem. Several criteria may be used to determine the best recovery execution(s). In this work, we define the best recovery execution as the one which is attainable from the failed execution with a maximal number of invisible compensations with respect to the client. We assume that all transitions are compensatable. For a given recovery execution, we prove that the decision problem associated with computing the number of invisibly compensated transitions is NP-complete, and thus, we conclude that deciding of the best recovery execution is in $$\Sigma _2^P$$Σ2P.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-642-39955-8_9
A Dynamic Approach to Isolating Erroneous Event Patterns in Concurrent Program Executions
  • Jan 1, 2013
  • Jing Xu + 3 more

Concurrency bugs are hard to find due to the nondeterministic behavior of concurrent programs. In this paper, we present an algorithm for isolating erroneous event patterns in concurrent program executions. Failed executions are characterized as a sequence of switch points, which capture the interleaving of read and write events on shared variables. The algorithm inputs the sequence of a failed execution, and outputs erroneous event patterns. We implemented our algorithm and conducted an experimental evaluation on several Java benchmark programs. The results of our evaluation show that our approach can effectively and efficiently identify erroneous event patterns in failed executions.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/valid.2010.8
Using Hardware Performance Counters for Fault Localization
  • Aug 1, 2010
  • Cemal Yilmaz

In this work, we leverage hardware performance counters-collected data as abstraction mechanisms for program executions and use these abstractions to identify likely causes of failures. Our approach can be summarized as follows: Hardware counters-based data is collected from both successful and failed executions, the data collected from the successful executions is used to create normal behavior models of programs, and deviations from these models observed in failed executions are scored and reported as likely causes of failures. The results of our experiments conducted on three open source projects suggest that the proposed approach can effectively prioritize the space of likely causes of failures, which can in turn improve the turn around time for defect fixes.

  • Research Article
  • 10.47772/ijriss.2025.910000517
Tahfiz Education for Visually Impaired Students: An In-Depth Analysis of Survey Findings from Teachers and Students on Barriers and Developments at the Tahfiz for the Visually Impaired (TVI)
  • Nov 18, 2025
  • International Journal of Research and Innovation in Social Science
  • Hussein Ali Abdullah Al-Thulaia + 4 more

This article critically examines the complex implementation of the Al-Quran Tahfiz (Memorization) program for Visually Impaired Students at the Tahfiz for the Visually Impaired (TVI), focusing on the analysis of perceptual divergence between teachers and students. The study employed quantitative field survey data collected from Teachers (N=3) and Students (N=7). The findings reveal a compelling Ecological Paradox: while the program achieves superior spiritual and personal efficacy (mean agreement exceeding 4.90), it confronts three critical, yet divergently prioritized, systemic challenges, indicating a failure in execution across various ecological system levels. Students identified the Absolute Lack of Modern Assistive Technology (Digital Braille) as the single most severe barrier (with an absolute mean of (5.00), representing a direct failure in the TVI's Microsystem. Conversely, teachers acknowledged the Urgent Need for Specialized Training in Inclusive Pedagogy as their top priority (4.67), signalling a deficit within the Mesosystem (the interface between teacher training and the institute). Furthermore, the exceptionally high student demand for Specialized Psychological Counselling Services (4.86) confirmed a deep emotional support gap. This paper provides an in-depth analysis of these divergences, framing them within Bronfenbrenner’s Ecological Systems Theory, the principles of Maqasid Shariah (Objectives of Islamic Law), and Universal Design for Learning (UDL). It details an Action Oriented Roadmap to translate the spiritual success of the program into sustainable technical and methodological competence at TVI.

  • Research Article
  • 10.12816/0018851
A Survey of the Effective Factors on Success or Failure of Executing Geographical Information Systems by Multiple - Criteria Decision Making Techniques
  • Aug 1, 2014
  • Kuwait Chapter of Arabian Journal of Business and Management Review
  • Ali Bazaee + 1 more

Geographical Information Systems (GIS) are taken into attention more by development of complexity and projects dimensions. Thus, appropriate selection of required system, successful implementation and effective use of these systems are the concerns of top managers and managers of IT of all organizations. Thus, implementation and execution of such systems is one of the main concerns in success of such systems. The present study attempted to investigate the success of GIS project success by review of literature and analytical methods. Also, it attempted to evaluate the main factors of project failure (relevant risks) to have successful implementation by managing them. For data collection, library and field study (questionnaire) were used. Ranking of the agreed factors of experts to identify success or failure of execution was done using Analytical Hierarchy Analysis (AHP) method. This decision making method is based on pairwise comparison. To investigate the mutual effects, Dematel method was used. To investigate the effective factors on success or failure of GIS projects, hypotheses were tested. It can be said various factors with different weights are effective on various signs of failure of GIS execution. Indeed, all main factors as organization condition, Geography, human resources,

  • Conference Article
  • 10.1109/tencon.1993.320003
Prolog vs procedural backtracking
  • Oct 19, 1993
  • Yaowei Liu

Backtracking algorithms try to solve problems by generating and testing a set of possible solutions. Typically, there are some points in the execution of a backtracking algorithm where execution can proceed in more than one way. Such points are called choice points. Backtrack technique has been incorporated into variants of many high level languages, mostly procedural languages, such as Pascal, Fortran and LISP. Also, backtracking is one of the major features of Prolog. A programming language with has built-in support for backtracking algorithms is called a backtracking language. Typically its semantics recognise success or failure of execution, and it supports programming of choice points. It hides details about how choices are made at choice points, how to backtrack, and how to resume the previous state. Backtracking languages find wide application in artificial intelligence algorithms. We compare the backtracking mechanisms of Prolog and other backtracking programming languages, and present a new backtracking language.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>

  • Research Article
  • Cite Count Icon 5
  • 10.1108/aaaj-11-2022-6164
New world, or out of this world? Columbus – an exploratory study of HASS and STEM success factors in the first “space” race
  • Jan 4, 2024
  • Accounting, Auditing & Accountability Journal
  • Richard M Kerslake + 1 more

PurposeThe purpose of this paper is to investigate the extent to which interdisciplinary (HASS, i.e. non-STEM) factors—in particular, accounting, stakeholder management and accountability—enable, influence and motivate large human exploration ventures, principally in maritime and space fields, utilizing Columbus’s and Chinese explorations of the 1400s as the primary setting.Design/methodology/approachThe study analyzes archival data from narrative and interpretational history, including both academic and non-academic sources, that relate to two global historical events, the Columbus and Ming Chinese exploration eras (c. 1400–1500), as a parallel to the modern “Space Race”. Existing studies on pertinent HASS (Humanities and Social Sciences) and STEM (Science, Technology, Engineering and Mathematics) enablers, influencers and motivators are utilized in the analysis. The authors draw upon the concepts of stakeholder theory and the construct of accountability in their analysis.FindingsFindings suggest that non-STEM considerations—politics, finance, accountability, culture, theology and others—played crucial roles in enabling Western Europe (Columbus) to reach the Americas before China or other global powers, demonstrating the pivotal importance of HASS factors in human advancements and exploration.Research limitations/implicationsIn seeking to answer those questions, this study identifies only those factors (HASS or STEM) that may support the success or failure in execution of the exploration and development of a region such as the New World or Space. Moreover, the study has the following limitation. Relative successes, failures, drivers and enablers of exploratory ventures are drawn almost exclusively from the documented historical records of the nations, entities and individuals (China and Europe) who conducted those ventures. A paucity of objective sources in some fields, and the need to set appropriate boundaries for the study, also necessitate such limitation.Practical implicationsIt is observable that many of those HASS factors also appear to have been influencers in modern era Space projects. For Apollo and Soyuz, success factors such as the relative economics of USA and USSR, their political ideologies, accountabilities and organizational priorities have clear echoes. What the successful voyages of Columbus and Apollo also have in common is an appetite to take risks for an uncertain return, whether as sponsor or voyager; an understanding of financial management and benefits measurement, and a leadership (Isabella I, John F. Kennedy) possessing a vision, ideology and governmental apparatus to further the venture’s goals.Originality/valueWhilst various historical studies have examined influences behind the oceangoing explorations of the 1400s and the colonization of the “New World”, this article takes an original approach of analyzing those motivations and other factors collectively, in interdisciplinary terms (HASS and STEM). This approach also has the potential to provide a novel method of examining accountability and performance in modern exploratory ventures, such as crewed space missions.

  • Research Article
  • Cite Count Icon 2
  • 10.54783/ijsoc.v3i4.476
Implementation of Collaborative Governance in Public Policy Handling COVID-19
  • Dec 27, 2021
  • International Journal of Science and Society
  • Muhammad Musaad

Collaborative governance seems to be a response to failures in execution, exorbitant costs, and the politicization of public sector laws. The emphasis is on all stages of public policy. The unprecedented COVID-19 epidemic has compelled the government to be prepared to deal with it, and to do so fast and effectively. To combat the epidemic, the Indonesian government has implemented a variety of strategies, including social restrictions, mandated immunizations, national economic recovery, and so on. However, the government cannot undertake Collaborative government alone; thus, cross-disciplinary and field cooperation is required. As a result, the purpose of this research is to explore the application of collaborative governance in dealing with COVID-19 in Indonesia. This research combines a qualitative methodology with a descriptive approach. According to the findings of the research, collaborative governance in the management of COVID 19 has four key values: consensus orientation, collective leadership, multi-way communication, and resource sharing. Collaborative governance in COVID-19 management may take the form of, among other things, publicizing the hazards of COVID-19, providing masks, creating and spraying disinfectants, and distributing hand sanitizers.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/jictee.2014.6804098
Analysis of text-based CAPTCHA images using Template Matching Correlation technique
  • Mar 1, 2014
  • Promprawatt Sakkatos + 3 more

Text-based CAPTCHA images have been widely utilized in on-line applications to anti malicious programs which attempt to make failure in execution or computation. Although installing CAPTCHA enhances system's security, it has to be continuously analysed, improved and developed for hard decoding or extracting from intrusion of automatic programs. This paper is mainly focused on examination of text-based CAPTCHA images with several degrees of noise, skew, font type and size. The Template Matching Correlation (TMC) technique consisting of image conversion, threshold, noise rejection, segmentation and recognition methods, is introduced for analysis. From simulation results, the robustness is increased after the image is distorted by noise background and font skew in the range of 0.3 to 0.4 and 10° to 15°; however fluently recognized by human.

Save Icon
Up Arrow
Open/Close