The Importance of Accounting for Execution Failures when Predicting Test Flakiness
Flaky tests are tests that pass and fail on different executions of the same version of a program under test. They waste valuable developer time by making developers investigate false alerts (flaky test failures). To deal with this issue, many prediction methods have been proposed. However, the utility of these methods remains unclear since they are typically evaluated based on single-release data, ignoring that in many cases tests that fail flakily in one release also correctly fail (indicating the presence of bugs) in some other, meaning that it is possible for subsequent correctly-failing cases to pass unnoticed. In this paper, we show that this situation is prevalent and can raise significant concerns for both researchers and practitioners. In particular, we show that flaky tests, tests that exhibit flaky behaviour at some point in time, have a strong fault-revealing capability, i.e., they reveal more than 1/3 of all encountered regression faults. We also show that 76.2%, of all test executions that reveal faults in the codebase under test are made by tests that are classified as flaky by existing prediction methods. Overall, our findings motivate the need for future research to focus on predicting flaky test executions instead of flaky tests.
- Research Article
21
- 10.1017/s0033291707001468
- Sep 10, 2007
- Psychological medicine
Although poor neuropsychological test performance is well documented in schizophrenia, how closely it resembles that seen in patients with brain damage in terms of cognitive failures in daily life and stability over time has been little studied. Thirty patients with chronic schizophrenia, 24 patients with frontal or temporal brain damage and 30 healthy controls were given a battery of memory and executive tests. Carers of the two patient groups also completed questionnaires rating memory and executive failures in daily life. Testing was repeated 6 weeks later. The schizophrenia and the brain-damaged patients were significantly impaired on most, but not all tests. The degree of carer-rated memory or executive failure was similar in the two groups, but the schizophrenia patients were rated as having significantly more executive failures than memory failures, whereas the brain-damaged patients showed the reverse pattern. Both groups of patients showed similar consistency of performance across sessions. Neuropsychological impairment in schizophrenia resembles that seen in patients with brain damage, not only in terms of overall severity, but also in terms of stability and the degree to which poor test performance translates into cognitive failures in daily life.
- Research Article
5
- 10.1016/j.robot.2022.104350
- Dec 25, 2022
- Robotics and Autonomous Systems
A hybrid skill parameterisation model combining symbolic and subsymbolic elements for introspective robots
- Conference Article
16
- 10.1145/3338906.3338966
- Aug 12, 2019
Locating vulnerabilities is an important task for security auditing, exploit writing, and code hardening. However, it is challenging to locate vulnerabilities in binary code, because most program semantics (e.g., boundaries of an array) is missing after compilation. Without program semantics, it is difficult to determine whether a memory access exceeds its valid boundaries in binary code. In this work, we propose an approach to locate vulnerabilities based on memory layout recovery. First, we collect a set of passed executions and one failed execution. Then, for passed and failed executions, we restore their program semantics by recovering fine-grained memory layouts based on the memory addressing model. With the memory layouts recovered in passed executions as reference, we can locate vulnerabilities in failed execution by memory layout identification and comparison. Our experiments show that the proposed approach is effective to locate vulnerabilities on 24 out of 25 DARPA’s CGC programs (96%), and can effectively classifies 453 program crashes (in 5 Linux programs) into 19 groups based on their root causes.
- Book Chapter
8
- 10.1007/11951957_28
- Jan 1, 2006
The increasing popularity of Web services for application integration has resulted in a large body of research on Web service composition. However, the major lacuna so far in Web service composition is the lack of a holistic requirements-driven approach for modeling the entire Web service lifecycle, i.e., composition, joint execution, midstream adaptation in response to failures or changing requirements, and finally re-execution until successful completion. In this paper we present such an approach based on our earlier work on context-driven Web service modeling. In particular, we separate requirements into two parts – functional and extra-functional requirements (FRs and EFRs, respectively). We express FRs as commitments made by individual Web services towards the composite Web service, and EFRs as rules that constrain the behavior of the individual Web services while they execute against their FRs. We also show how midstream adaptation in Web service execution – caused either by changes in user requirements or execution failures – can be managed in our approach. We believe that ours is the first such approach towards a comprehensive modeling of requirements for composite Web service executions, and especially during adaptation.
- Research Article
- 10.37276/sjss.v6i1.542
- Nov 29, 2025
- SIGn Journal of Social Science
The development of labor law in Indonesia aims to achieve social justice by protecting workers’ rights. However, the reality of law enforcement reveals a juridical anomaly in which Industrial Relations Court decisions that have obtained permanent legal force (inkracht van gewijsde) are frequently not executed (non-executable). Consequently, workers’ normative rights, such as the right to severance pay and unpaid wages, remain unfulfilled. This study aims to analyze the root causes of such execution failures and to formulate effective institutional solutions. Utilizing a normative juridical research method with statutory, conceptual, and case approaches, this research examines the dependence of Law Number 2 of 2004 on the Civil Procedure Law—specifically the HIR/RBg—which is passive and formalistic in nature. The results indicate that execution failure is caused by the burden of asset proof being placed entirely on the worker. Furthermore, the absence of the court’s investigative authority to conduct asset tracing, along with judges’ weak application of Conservatory Attachment (Conservatoir Beslag) and Penalty Payments (Dwangsom), exacerbates the situation. Dependence on the archaic HIR/RBg procedures proves incompatible with the characteristics of labor disputes that demand speed. This is worsened by the dynamics of non-standard employment relationships in the gig economy, which are vulnerable to asset stripping. This study concludes that without procedural law reform, Industrial Relations Court decisions remain merely illusory judgments. Therefore, the establishment of a Special Execution Unit within the Industrial Relations Court, with autonomous authority to access integrated asset data, is recommended. Additionally, the issuance of regulations mandating the application of Dwangsom on an ex officio basis in every condemnatory (condemnatoir) decision is necessary to guarantee legal certainty and substantive justice for workers.
- Research Article
13
- 10.1007/bf01053809
- Apr 1, 1994
- Journal of Financial Services Research
The First Executive Corporation was the largest failure in the history of the life insurance industry. The company was one of the most aggressive purchasers of junk bonds through the 1980s and was the first of several large failures in the staid life insurance industry. In this article, we examine the effect of First Executive's failure on the value of companies in the life insurance industry. We find that the price of other life insurance companies' stock is negatively affected by the earnings announcement that preceded First Executive's failure. The magnitude of an individual company's reaction to First Executive's loss varies according to the proportion of the company's assets invested in junk bonds, the proportion of the company's assests invested in real estate, and the financial strength of the company as measured by A.M. Best's rating.
- Research Article
5
- 10.1007/s11761-015-0176-z
- Feb 22, 2015
- Service Oriented Computing and Applications
In the protocol-based Web service composition, the runtime unavailability of component services may result in a failed execution of the composite. In literature, multiple recovery heuristics have been proposed. This work provides a formal study and focuses on the complexity issues of the recovery problem in the protocol-based Web service composition. A recovery is a process responsible of migrating the failed execution into an alternative execution of the composite that still has the ability to reach a final state. The alternative execution is called a recovery execution. Following failure occurrence, several recovery executions may be available. The problem of finding the best recovery execution(s) is called the recovery problem. Several criteria may be used to determine the best recovery execution(s). In this work, we define the best recovery execution as the one which is attainable from the failed execution with a maximal number of invisible compensations with respect to the client. We assume that all transitions are compensatable. For a given recovery execution, we prove that the decision problem associated with computing the number of invisibly compensated transitions is NP-complete, and thus, we conclude that deciding of the best recovery execution is in $$\Sigma _2^P$$Σ2P.
- Book Chapter
2
- 10.1007/978-3-642-39955-8_9
- Jan 1, 2013
Concurrency bugs are hard to find due to the nondeterministic behavior of concurrent programs. In this paper, we present an algorithm for isolating erroneous event patterns in concurrent program executions. Failed executions are characterized as a sequence of switch points, which capture the interleaving of read and write events on shared variables. The algorithm inputs the sequence of a failed execution, and outputs erroneous event patterns. We implemented our algorithm and conducted an experimental evaluation on several Java benchmark programs. The results of our evaluation show that our approach can effectively and efficiently identify erroneous event patterns in failed executions.
- Conference Article
7
- 10.1109/valid.2010.8
- Aug 1, 2010
In this work, we leverage hardware performance counters-collected data as abstraction mechanisms for program executions and use these abstractions to identify likely causes of failures. Our approach can be summarized as follows: Hardware counters-based data is collected from both successful and failed executions, the data collected from the successful executions is used to create normal behavior models of programs, and deviations from these models observed in failed executions are scored and reported as likely causes of failures. The results of our experiments conducted on three open source projects suggest that the proposed approach can effectively prioritize the space of likely causes of failures, which can in turn improve the turn around time for defect fixes.
- Research Article
- 10.47772/ijriss.2025.910000517
- Nov 18, 2025
- International Journal of Research and Innovation in Social Science
This article critically examines the complex implementation of the Al-Quran Tahfiz (Memorization) program for Visually Impaired Students at the Tahfiz for the Visually Impaired (TVI), focusing on the analysis of perceptual divergence between teachers and students. The study employed quantitative field survey data collected from Teachers (N=3) and Students (N=7). The findings reveal a compelling Ecological Paradox: while the program achieves superior spiritual and personal efficacy (mean agreement exceeding 4.90), it confronts three critical, yet divergently prioritized, systemic challenges, indicating a failure in execution across various ecological system levels. Students identified the Absolute Lack of Modern Assistive Technology (Digital Braille) as the single most severe barrier (with an absolute mean of (5.00), representing a direct failure in the TVI's Microsystem. Conversely, teachers acknowledged the Urgent Need for Specialized Training in Inclusive Pedagogy as their top priority (4.67), signalling a deficit within the Mesosystem (the interface between teacher training and the institute). Furthermore, the exceptionally high student demand for Specialized Psychological Counselling Services (4.86) confirmed a deep emotional support gap. This paper provides an in-depth analysis of these divergences, framing them within Bronfenbrenner’s Ecological Systems Theory, the principles of Maqasid Shariah (Objectives of Islamic Law), and Universal Design for Learning (UDL). It details an Action Oriented Roadmap to translate the spiritual success of the program into sustainable technical and methodological competence at TVI.
- Research Article
- 10.12816/0018851
- Aug 1, 2014
- Kuwait Chapter of Arabian Journal of Business and Management Review
Geographical Information Systems (GIS) are taken into attention more by development of complexity and projects dimensions. Thus, appropriate selection of required system, successful implementation and effective use of these systems are the concerns of top managers and managers of IT of all organizations. Thus, implementation and execution of such systems is one of the main concerns in success of such systems. The present study attempted to investigate the success of GIS project success by review of literature and analytical methods. Also, it attempted to evaluate the main factors of project failure (relevant risks) to have successful implementation by managing them. For data collection, library and field study (questionnaire) were used. Ranking of the agreed factors of experts to identify success or failure of execution was done using Analytical Hierarchy Analysis (AHP) method. This decision making method is based on pairwise comparison. To investigate the mutual effects, Dematel method was used. To investigate the effective factors on success or failure of GIS projects, hypotheses were tested. It can be said various factors with different weights are effective on various signs of failure of GIS execution. Indeed, all main factors as organization condition, Geography, human resources,
- Conference Article
- 10.1109/tencon.1993.320003
- Oct 19, 1993
Backtracking algorithms try to solve problems by generating and testing a set of possible solutions. Typically, there are some points in the execution of a backtracking algorithm where execution can proceed in more than one way. Such points are called choice points. Backtrack technique has been incorporated into variants of many high level languages, mostly procedural languages, such as Pascal, Fortran and LISP. Also, backtracking is one of the major features of Prolog. A programming language with has built-in support for backtracking algorithms is called a backtracking language. Typically its semantics recognise success or failure of execution, and it supports programming of choice points. It hides details about how choices are made at choice points, how to backtrack, and how to resume the previous state. Backtracking languages find wide application in artificial intelligence algorithms. We compare the backtracking mechanisms of Prolog and other backtracking programming languages, and present a new backtracking language.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
- Research Article
5
- 10.1108/aaaj-11-2022-6164
- Jan 4, 2024
- Accounting, Auditing & Accountability Journal
PurposeThe purpose of this paper is to investigate the extent to which interdisciplinary (HASS, i.e. non-STEM) factors—in particular, accounting, stakeholder management and accountability—enable, influence and motivate large human exploration ventures, principally in maritime and space fields, utilizing Columbus’s and Chinese explorations of the 1400s as the primary setting.Design/methodology/approachThe study analyzes archival data from narrative and interpretational history, including both academic and non-academic sources, that relate to two global historical events, the Columbus and Ming Chinese exploration eras (c. 1400–1500), as a parallel to the modern “Space Race”. Existing studies on pertinent HASS (Humanities and Social Sciences) and STEM (Science, Technology, Engineering and Mathematics) enablers, influencers and motivators are utilized in the analysis. The authors draw upon the concepts of stakeholder theory and the construct of accountability in their analysis.FindingsFindings suggest that non-STEM considerations—politics, finance, accountability, culture, theology and others—played crucial roles in enabling Western Europe (Columbus) to reach the Americas before China or other global powers, demonstrating the pivotal importance of HASS factors in human advancements and exploration.Research limitations/implicationsIn seeking to answer those questions, this study identifies only those factors (HASS or STEM) that may support the success or failure in execution of the exploration and development of a region such as the New World or Space. Moreover, the study has the following limitation. Relative successes, failures, drivers and enablers of exploratory ventures are drawn almost exclusively from the documented historical records of the nations, entities and individuals (China and Europe) who conducted those ventures. A paucity of objective sources in some fields, and the need to set appropriate boundaries for the study, also necessitate such limitation.Practical implicationsIt is observable that many of those HASS factors also appear to have been influencers in modern era Space projects. For Apollo and Soyuz, success factors such as the relative economics of USA and USSR, their political ideologies, accountabilities and organizational priorities have clear echoes. What the successful voyages of Columbus and Apollo also have in common is an appetite to take risks for an uncertain return, whether as sponsor or voyager; an understanding of financial management and benefits measurement, and a leadership (Isabella I, John F. Kennedy) possessing a vision, ideology and governmental apparatus to further the venture’s goals.Originality/valueWhilst various historical studies have examined influences behind the oceangoing explorations of the 1400s and the colonization of the “New World”, this article takes an original approach of analyzing those motivations and other factors collectively, in interdisciplinary terms (HASS and STEM). This approach also has the potential to provide a novel method of examining accountability and performance in modern exploratory ventures, such as crewed space missions.
- Research Article
2
- 10.54783/ijsoc.v3i4.476
- Dec 27, 2021
- International Journal of Science and Society
Collaborative governance seems to be a response to failures in execution, exorbitant costs, and the politicization of public sector laws. The emphasis is on all stages of public policy. The unprecedented COVID-19 epidemic has compelled the government to be prepared to deal with it, and to do so fast and effectively. To combat the epidemic, the Indonesian government has implemented a variety of strategies, including social restrictions, mandated immunizations, national economic recovery, and so on. However, the government cannot undertake Collaborative government alone; thus, cross-disciplinary and field cooperation is required. As a result, the purpose of this research is to explore the application of collaborative governance in dealing with COVID-19 in Indonesia. This research combines a qualitative methodology with a descriptive approach. According to the findings of the research, collaborative governance in the management of COVID 19 has four key values: consensus orientation, collective leadership, multi-way communication, and resource sharing. Collaborative governance in COVID-19 management may take the form of, among other things, publicizing the hazards of COVID-19, providing masks, creating and spraying disinfectants, and distributing hand sanitizers.
- Conference Article
7
- 10.1109/jictee.2014.6804098
- Mar 1, 2014
Text-based CAPTCHA images have been widely utilized in on-line applications to anti malicious programs which attempt to make failure in execution or computation. Although installing CAPTCHA enhances system's security, it has to be continuously analysed, improved and developed for hard decoding or extracting from intrusion of automatic programs. This paper is mainly focused on examination of text-based CAPTCHA images with several degrees of noise, skew, font type and size. The Template Matching Correlation (TMC) technique consisting of image conversion, threshold, noise rejection, segmentation and recognition methods, is introduced for analysis. From simulation results, the robustness is increased after the image is distorted by noise background and font skew in the range of 0.3 to 0.4 and 10° to 15°; however fluently recognized by human.