Articles published on Code coverage
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
602 Search results
Sort by Recency
- New
- Research Article
- 10.7860/jcdr/2026/81216.22192
- Jan 1, 2026
- JOURNAL OF CLINICAL AND DIAGNOSTIC RESEARCH
- Hari Shankar Kumar + 4 more
Introduction: Charcot-Marie-Tooth disease (CMT) is one of the most common inherited Neuromuscular Disorders (NMDs), classified under peripheral neuropathies and characterised by progressive motor and sensory dysfunction. Although Wholeexome Sequencing (WES), gene panels, and conventional methods have improved detection rates, they often miss deep intronic, regulatory, and Structural Variants (SVs). Wholegenome Sequencing (WGS), with its comprehensive coverage of coding and non coding regions, enables the identification of variants that are often overlooked by other approaches. Aim: To assess the diagnostic utility of WGS in CMT cases that remain unresolved by WES, analysing both coding and non coding variants. Materials and Methods: The present cross-sectional diagnostic study was conducted between July 2023 and January 2025 at the Neuberg Center for Genomic Medicine (NCGM), Ahmedabad, Gujarat, India. WGS was performed on 31 clinically suspected CMT patients, including two who had previously tested negative by WES. Both coding and non coding variants including missense, nonsense, frameshift, in-frame, intronic, and 5’ Untranslated Region (UTR) mutations were analysed. Variants were classified according to American College of Medical Genetics and Genomics (ACMG) guidelines, incorporating Combined Annotation Dependent Depletion (CADD) scores and Minor Allele Frequency (MAF) thresholds. They were interpreted based on pathogenicity, inheritance patterns, and genotypephenotype correlations. Selected non coding variants in the Gap Junction Beta-1 (GJB1; c.-16-511G>C) and Lamin A/C (LMNA; c.-142C>A) genes were validated by Sanger sequencing. Results: Sequencing data from 31 participants were processed using a standardised bioinformatics pipeline. Variants were classified according to ACMG guidelines, and their frequencies were calculated. WES and WGS results were compared to determine the additional diagnostic yield. WGS identified clinically significant non coding variants in GJB1 (intronic) and LMNA (5’ UTR) in two cases, yielding a 6.5% increase over WES. Overall, 31 variants were detected: 11 (35.5%) classified as pathogenic, 2 (6.5%) as likely pathogenic, and 18 (58.0%) as Variants of Uncertain Significance (VUS), reflecting the genetic heterogeneity of CMT. Conclusion: The WGS enhances diagnostic accuracy in CMT by detecting clinically relevant non coding variants often missed by WES. This is the first report from India confirming a GJB1 intronic variant and a Lamin A/C (LMNA) 5´ UTR variant using WGS in CMT patients. These findings support the integration of WGS into routine diagnostic workflows and highlight the value of comprehensive variant analysis for early and precise genetic diagnosis.
- New
- Research Article
- 10.3390/s26010077
- Dec 22, 2025
- Sensors (Basel, Switzerland)
- Minrui Yan + 3 more
The rise of connected and automated vehicles has transformed in-vehicle infotainment (IVI) systems into critical gateways linking user interfaces, vehicular networks, and cloud-based fleet services. A concerning architectural reality is that hardcoded credentials like access point names (APNs) in IVI firmware create a cross-layer attack surface where local exposure can escalate into entire vehicle fleets being remotely compromised. To address this risk, we propose a cross-layer security framework that integrates firmware extraction, symbolic execution, and targeted fuzzing to reconstruct authentic IVI-to-backend interactions and uncover high-impact web vulnerabilities such as server-side request forgery (SSRF) and broken access control. Applied across seven diverse automotive systems, including major original equipment manufacturers (OEMs) (Mercedes-Benz, Tesla, SAIC, FAW-VW, Denza), Tier-1 supplier Bosch, and advanced driver assistance systems (ADAS) vendor Minieye, our approach exposes systemic anti-patterns and demonstrates a fully realized exploit that enables remote control of approximately six million Mercedes-Benz vehicles. All 23 discovered vulnerabilities, including seven CVEs, were patched within one month. In closed automotive ecosystems, we argue that the true measure of efficacy lies not in maximizing code coverage but in discovering actionable, fleet-wide attack paths, which is precisely what our approach delivers.
- New
- Research Article
1
- 10.1145/3742894
- Dec 19, 2025
- ACM Transactions on Software Engineering and Methodology
- Ao Li + 4 more
Parametric generators combine coverage-guided and generator-based fuzzing for testing programs requiring structured inputs. They function as decoders that transform arbitrary byte sequences into structured inputs, allowing mutations on byte sequences to map directly to mutations on structured inputs, without requiring specialized mutators. However, this technique is prone to the havoc effect , where small mutations on the byte sequence cause large, destructive mutations to the structured input. This article investigates the paradoxical nature of the havoc effect for generator-based fuzzing in Java. In particular, we measure mutation characteristics and confirm the existence of the havoc effect, as well as scenarios where it may be more detrimental. Our evaluation across seven real-world Java applications compares various techniques that perform context-aware, finer-grained mutations on parametric byte sequences, such as JQF-EI, BeDivFuzz, and Zeugma. We find that these techniques exhibit better control over input mutations and consistently reduce the havoc effect compared to our coverage-guided fuzzer baseline Zest. While we find that context-aware mutation approaches can achieve significantly higher code coverage, we see that destructive mutations still play a valuable role in discovering inputs that increase code coverage. Specialized mutation strategies, while effective, impose substantial computational overhead—revealing practical tradeoffs in mitigating the havoc effect.
- Research Article
- 10.36548/jucct.2025.4.003
- Dec 1, 2025
- Journal of Ubiquitous Computing and Communication Technologies
- Baskaran S + 2 more
Unit testing plays a crucial role in application software development by validating module functionality in isolation before system integration. Manually writing and reviewing unit test cases is time-consuming and defect-prone. Complex logic and boundary conditions are not tested thoroughly, leading to higher rework costs. Automated test generation using Large Language Models (LLMs) reduces development effort but faces challenges such as ensuring meaningful test coverage, handling invalid inputs, and addressing missing imports. This study aims to leverage LLMs in combination with the Autogen Agentic AI framework to generate high-quality Python unit tests by effectively prompting them, fixing failed test cases, validating them through test execution, analyzing results, and improving code coverage and mutation score. For experiments conducted on the Insurance Management Application, branch coverage improved from 98% to 99%, and the mutation score improved from 83.9% to 95.8%. The proposed approach significantly reduces manual effort while improving test suite effectiveness and software quality.
- Research Article
- 10.1142/s0218539325500445
- Nov 14, 2025
- International Journal of Reliability, Quality and Safety Engineering
- Joshua Steakelum + 3 more
Large-scale software exhibits periods of increased defect discovery when blocks of less thoroughly tested code are introduced into an existing codebase. For example, the mission systems schedule of software intensive government acquisition programs includes multiple overlapping software blocks associated with various capabilities. Software reliability researchers have proposed changepoint models to characterize periods of increased defect discovery. However, these models attempt to identify the location of these changepoints after testing has been performed, which is counter-intuitive because conscious decisions such as testing new functionality drive software changepoints. Existing changepoint models are therefore difficult to employ in a predictive manner. To overcome this limitation, this paper proposes a covariate software defect discovery model capable of explaining changepoints in terms of common software testing activities and metrics such as software size estimation, code coverage, and defect density. The proposed and past changepoint models are compared with respect to their predictive accuracy and computational efficiency. Our results indicate that the proposed approach is more computationally efficient and enables accurate prediction of the time needed to achieve a desired defect discovery intensity or mean time to failure despite the occurrence of changepoints during software testing.
- Research Article
- 10.1145/3763185
- Oct 9, 2025
- Proceedings of the ACM on Programming Languages
- Chenyang Ma + 2 more
Fuzzing is an effective technique to detect vulnerabilities in smart contracts. The challenge of smart contract fuzzing lies in the statefulness of contracts, which indicates that certain vulnerabilities can only be manifested in specific contract states. State-of-the-art fuzzers may generate and execute a plethora of meaningless or redundant transaction sequences during fuzzing, incurring a penalty in efficiency. To this end, we present DepFuzz, a hybrid fuzzer for efficient smart contract fuzzing, which introduces a symbolic execution module into the feedback-based fuzzer. Guided by the distance-based function dependencies between functions, DepFuzz can efficiently yield meaningful transaction sequences that contribute to vulnerability exposure or code coverage. The experiments on 286 benchmark smart contracts and 500 large real-world smart contracts corroborate that, compared to state-of-the-art approaches, DepFuzz achieves higher instruction coverage rate and uncovers many more vulnerabilities with less time.
- Research Article
- 10.4314/njt.v44i2.15
- Sep 30, 2025
- Nigerian Journal of Technology
- N.O Eke + 4 more
To conduct a survey of Mobile Applications Testing (MAT) to determine the research contributions in mobile applications testing such as the test approach, test strategy, testing techniques and evaluation methods used in mobile applications testing, as well as to determine the publication frequency and the publication region. This study adopted the guidelines provided by Kitchenham and Charters, and Petersen et al. for conducting this systematic mapping study. A total of 242 studies were selected using predefined inclusion/exclusion criteria. Studies were retrieved from five major academic databases (IEEE Xplore, ACM Digital Library, ScienceDirect, Web of Science, and EBSCOhost) using validated search strings. Findings show that MAT publications increased steadily between 2009 and 2022, with 2018 recording the highest number (n = 33). China and United States were the most active contributors. Model-based testing emerged as the most commonly used testing technique, while fault detection and code coverage were the most widely adopted evaluation methods. Dynodroid, with 952 citations and an NCII score of 79.3, was identified as the most influential MAT-related study. This study presents a structured overview of MAT research trends, methods, and influential works, offering a valuable reference for researchers and practitioners in the software testing community.
- Research Article
- 10.3390/fi17100450
- Sep 30, 2025
- Future Internet
- Mostafa Kira + 4 more
Facial recognition systems are increasingly used for authentication across domains such as finance, e-commerce, and public services, but their growing adoption raises significant concerns about spoofing attacks enabled by printed photos, replayed videos, or AI-generated deepfakes. To address this gap, we introduce a multi-layered Face Recognition-as-a-Service (FRaaS) platform that integrates passive liveness detection with active challenge–response mechanisms, thereby defending against both low-effort and sophisticated presentation attacks. The platform is designed as a scalable cloud-based solution, complemented by an open-source SDK for seamless third-party integration, and guided by ethical AI principles of fairness, transparency, and privacy. A comprehensive evaluation validates the system’s logic and implementation: (i) Frontend audits using Lighthouse consistently scored above 96% in performance, accessibility, and best practices; (ii) SDK testing achieved over 91% code coverage with reliable OAuth flow and error resilience; (iii) Passive liveness layer employed the DeepPixBiS model, which achieves an Average Classification Error Rate (ACER) of 0.4 on the OULU–NPU benchmark, outperforming prior state-of-the-art methods; and (iv) Load simulations confirmed high throughput (276 req/s), low latency (95th percentile at 1.51 ms), and zero error rates. Together, these results demonstrate that the proposed platform is robust, scalable, and trustworthy for security-critical applications.
- Research Article
- 10.1007/s10664-025-10726-x
- Sep 22, 2025
- Empirical Software Engineering
- Tarek Mahmud + 3 more
Abstract Android dominates the mobile operating system market, yet ensuring the quality and reliability of Android applications remains a persistent challenge. The diversity of devices, screen sizes, and OS versions complicates testing, leading to fragmented adoption of best practices. Despite advancements in automated testing, there is Limited empirical evidence on how developers test Android applications and the extent to which existing tools and frameworks are utilized effectively. In this paper, we aim to investigate the current state of Android app testing, identifying key challenges, Limitations, and best practices. Specifically, we assess the adoption of automated testing, test coverage levels, and the impact of testing practices on software quality. We conduct an experimental study on 2965 open-source Android apps, examining the quantity and coverage of the tests used for open-source Android app development. We further conduct a survey to gather more insights in testing practices from Android app developers and testers. The results reveal a limited adoption of testing among Android app developers, a restricted range of testing tools and frameworks being used, and low code and API coverage in testing. This investigation shows that current Android app testing practices are lacking the use of automated testing tools and embarks on a need for more awareness and adoption of state-of-the-art testing tools and techniques.
- Research Article
- 10.1145/3765754
- Sep 4, 2025
- ACM Transactions on Software Engineering and Methodology
- Matthew C Davis + 5 more
There is substantial diversity among testing tools used by software engineers. For example, fuzzers may target crashes and security vulnerabilities while Test sUite Generators (TUGs) may create high-coverage test suites. In the research community, test generation tools are primarily evaluated using metrics like bugs identified or code coverage. However, achieving good values for these metrics does not necessarily imply that these tools help software engineers efficiently develop effective test suites. To understand the test suite generation process, we performed a secondary analysis of recordings from a previously-published user study in which 28 professional software engineers used two tools to generate test suites for three programs with each tool. From these 168 recordings ( \(28\ users\times 2\ tools\times 3\ programs/tool\) ), we extracted a process model of test suite generation called TestLoop that builds upon prior work and systematizes a user’s test suite generation process for a single function into 7 steps. We then used TestLoop’s steps to describe 8 prior and 10 new recordings of users generating test suites using the Jest, Hypothesis, and NaNofuzz test generation tools. Our results showed that TestLoop can be used to help answer previously hard-to-answer questions about how users interact with test suite generation tools and to identify ways that tools might be improved.
- Research Article
- 10.1002/stvr.70009
- Sep 1, 2025
- Software Testing, Verification and Reliability
- Afonso Fontes + 2 more
ABSTRACTSearch‐based test generation typically targets structural coverage of source code. Past research suggests that targeting coverage alone is insufficient to yield tests that achieve common testing goals (e.g., discovering situations where a class‐under‐test throws exceptions) or detect faults. A suggested alternative is to perform multi‐objective optimization targeting both coverage and additional objectives directly related to the goals of interest. However, it is not fully clear how coverage and goal‐based objectives interact during the generation process and what effects this interaction will have on the generated test suites. In this study, we assess five hypotheses about multi‐objective test generation and the relationships between coverage‐based and goal‐based objectives, focusing on the effects on coverage, goal attainment, fault detection, test suite size, test case length and the impact of the search budget. We generate test suites using the EvoSuite framework targeting Branch Coverage, three testing goals—Exception Count, Output Coverage and Execution Time—and combinations of coverage and goal‐based objectives. Ultimately, we find that targeting multiple objectives does not reduce code coverage, yields no or minor reductions in goal attainment, but—at the same time—detects more faults compared with single‐target configurations. In addition, it produces larger test suites, but test case length is not increased. The benefits of multi‐objective optimization are often more limited than hypothesized in past research, but improved fault detection is still sufficient to recommend multi‐objective optimization over targeting coverage or testing goals alone. Our study offers insights and guidance into how coverage and goal‐based objectives interact during multi‐objective test generation.
- Research Article
- 10.59562/jessi.v6i2.9588
- Aug 31, 2025
- Journal of Embedded Systems, Security and Intelligent Systems
- Mushaf + 4 more
This study analyzes the development and impact of FOKUS!, a web-based scheduling application designed to help individuals, particularly students and professionals, manage time and tasks effectively. The system was built using an Agile approach across three sprints, involving UI/UX design, backend setup with Next.js and Supabase, and implementation of task management and notification features. Results from White Box testing indicated 100% frontend code coverage, with stable component rendering and logic validation. Black Box testing confirmed that core features—registration, login, task CRUD, and reminders— functioned as intended. Feasibility studies showed that the system is viable technically, economically, and organizationally. Users appreciated the intuitive interface and real-time synchronization. The system is expected to positively impact productivity and time efficiency. Future improvements include enhancing backend testing, integrating external calendar services, and conducting user acceptance testing to ensure a better user experience.
- Research Article
- 10.1007/s10664-025-10712-3
- Aug 30, 2025
- Empirical Software Engineering
- Muhammad Imran + 4 more
Abstract Performance testing aims to ensure the operational efficiency of software systems. However, many factors influencing the efficacy and adoption of performance tests in practice are not yet fully understood. For instance, while code coverage is widely regarded as a key quality metric for evaluating the efficacy of functional testing suites, there is limited knowledge about the types and levels of coverage that performance tests specifically achieve. Another important factor, often perceived as a barrier to the broader adoption of performance tests yet remaining relatively unexplored, is their extended execution time. In this paper, we examine (i) the coverage of performance testing suites, (ii) the characteristics of source code associated with performance-tested components, and (iii) the time cost of executing performance tests. Our analysis on open-source Java systems reveals that performance tests achieve significantly lower code coverage than functional tests, as expected, and it highlights a significant trade-off between coverage and execution time. Our results also indicate a lack of generalizable characteristics in the source code covered by performance tests.
- Research Article
- 10.30574/ijsra.2025.16.2.2287
- Aug 30, 2025
- International Journal of Science and Research Archive
- Aparna Mohan
The integration of Large Language Models (LLMs) into the hardware design verification (DV) landscape represents a pivotal moment in the evolution of verification workflows. LLMs offer powerful capabilities for natural language processing, code generation, and collaborative assistance, allowing them to bridge gaps between code comprehension, coverage analysis, and team communication. This review synthesizes the most recent developments in LLM-driven DV, covering assertion generation, coverage diagnostics, and UVM testbench completion. We propose an architectural model where modular LLM agents act as code analyzers, coverage interpreters, and assertion suggesters, working alongside human engineers. Experimental findings show clear advantages in accuracy, interpretability, and engineering efficiency. We conclude with an analysis of emerging trends and the necessary steps to industrialize LLM adoption in formal verification.
- Research Article
- Aug 8, 2025
- Lakartidningen
- Ola Olén + 1 more
The Swedish National Patient Register (NPR) is vital for epidemiological research. A 2010 review assessed the validity of inpatient diagnoses, but outpatient data were excluded. A recent review in the European Journal of Epidemiology examined validation studies for inpatient diagnoses post-2010 and outpatient data since 2001. Across 89 publications, median positive predictive value (PPV) was 84% for diagnoses (range: 18-100%) and 97% for surgical codes. Sensitivity was lower, median 73% (range: 45-80%). PPV and sensitivity varied depending on diagnosis, coding, reference standard, and data source. Different diagnostic criteria are needed depending on research question. Combining NPR with other registers can enhance accuracy. Limitations include incomplete outpatient data in early years, gaps in private healthcare reporting, and insufficient coverage of certain medical codes. Despite this, the NPR remains a reliable and central data source for medical research when limitations are considered.
- Research Article
- 10.1145/3715102
- Jul 26, 2025
- ACM Transactions on Computer Systems
- Cong Li + 3 more
We introduce the concept of compilation space as a new pivot for the comprehensive validation of just-in-time (JIT) compilers in modern language virtual machines (LVMs). The compilation space of a program encompasses a wide range of equivalent JIT-compilation choices, which can be cross-validated to ensure the correctness of the program’s JIT compilations. To thoroughly explore the compilation space in a lightweight and LVM-agnostic manner, we strategically mutate test programs with JIT-relevant but semantics-preserving code constructs, aiming to provoke diverse JIT compilation optimizations. We primarily implement this approach in Artemis , a tool for validating Java Virtual Machines (JVMs). Within three months, Artemis successfully discovered 85 bugs in three widely used production JVMs—HotSpot, OpenJ9, and the Android Runtime—where 53 were already confirmed or fixed and many of which were classified as critical. It is noteworthy that all reported bugs concern JIT compilers, highlighting the effectiveness and practicality of our technique. Building on the promising results with JVMs, we experimentally applied our technique to a state-of-the-art JavaScript Engine (JSE) fuzzer called Fuzzilli, aiming to augment it to find mis-compilation bugs without significantly sacrificing its ability to detect crashes. Our experiments demonstrate that our enhanced version of Fuzzilli namely Apollo could achieve comparable code coverage with a considerably smaller number of generated programs with a similar number of crashes. Additionally, Apollo successfully uncovered four mis-compilations in JavaScriptCore and SpiderMonkey within seven days. Following Artemis ’ and Apollo ’s success, we are expecting that the generality and practicability of our approach will make it broadly applicable for understanding and validating the JIT compilers of other LVMs.
- Research Article
- 10.3390/electronics14142914
- Jul 21, 2025
- Electronics
- Shuoyu Tao + 2 more
Fuzz testing plays a key role in improving Linux kernel security, but large-scale fuzzing often generates a high number of crash reports, many of which are redundant. These duplicated reports burden triage efforts and delay the identification of truly impactful bugs. Syzkaller, a widely used kernel fuzzer, clusters crashes using instruction pointers and sanitizer metadata. However, this heuristic may misgroup distinct issues or split similar ones caused by the same root cause. To address this, we present ECHO, a lightweight call stack-based deduplication tool that analyzes structural similarity among kernel stack traces. By computing the longest common subsequence (LCS) between normalized call stacks, ECHO groups semantically related crashes and improves post-fuzzing analysis. We integrate ECHO into the Syzkaller fuzzing workflow and use it to prioritize inputs that trigger deeper, previously untested kernel paths. Evaluated across multiple Linux kernel versions, ECHO improves average code coverage by 15.2% and discovers 20 previously unknown bugs, all reported to the Linux kernel community. Our results highlight that stack-aware crash grouping not only streamlines triage, but also enhances fuzzing efficiency by guiding seed selection toward unexplored execution paths.
- Research Article
- 10.1145/3748505
- Jul 14, 2025
- ACM Transactions on Software Engineering and Methodology
- Chen Yang + 4 more
Automatic test generation plays a critical role in software quality assurance. While the recent advances in Search-Based Software Testing (SBST) and Large Language Models (LLMs) have shown promise in generating useful tests, these techniques still struggle to cover certain branches. Reaching these hard-to-cover branches usually requires constructing complex objects and resolving intricate inter-procedural dependencies in branch conditions, which poses significant challenges for existing techniques. In this work, we propose TELPA, a novel technique aimed at addressing these challenges. Its key insight lies in extracting real usage scenarios of the target method under test to learn how to construct complex objects and extracting methods entailing inter-procedural dependencies with hard-to-cover branches to learn the semantics of branch constraints. To enhance efficiency and effectiveness, TELPA identifies a set of ineffective tests as counter-examples for LLMs and employs a feedback-based process to iteratively refine these counter-examples. Then, TELPA integrates program analysis results and counter-examples into the prompt, guiding LLMs to gain deeper understandings of the semantics of the target method and generate diverse tests that can reach the hard-to-cover branches. Our experimental results on 27 open-source Python projects demonstrate that TELPA significantly outperforms the state-of-the-art SBST and LLM-enhanced techniques, achieving an average improvement of 34.10% and 25.93% in terms of branch coverage.
- Research Article
- 10.1142/s0218194025500329
- Jul 1, 2025
- International Journal of Software Engineering and Knowledge Engineering
- Yating Yang + 2 more
Programming education in computer science is growing rapidly, and debugging is a key challenge for novice programmers due to their limited experience. Mutation-Based Fault Localization (MBFL) is widely used in industry, but its effectiveness and challenges in novice programs need further study. While Python is a popular language in machine learning and data science, there is little research comparing fault localization in Python and Java for novice programmers. To bridge this gap, we conduct an empirical study to evaluate MBFL’s accuracy and execution overhead in common novice programming errors across different languages. We analyze how program features like code coverage and mutation score affect MBFL’s performance and whether these effects differ between languages. We also examine how MBFL’s effectiveness changes when suspiciousness scores are the same and how mutant noise and coincidental correct test cases vary across languages. Additionally, we propose a mutation confidence formula based on repair potential and behavioral difference to assess the usefulness of mutants in MBFL. Our study demonstrates that MBFL works well for novice fault localization in both Java and Python, with Python performing better. MBFL correctly identifies 45, 70, and 92 faults within the TOP-N (N = 1, 3, 5), proving its strong performance. However, tie problems, mutant noise, and coincidental correct test cases weaken MBFL, especially in Java. Results in both languages show a strong positive correlation between mutant confidence and fault localization accuracy, confirming the formula’s effectiveness across languages.
- Research Article
- 10.1145/3728882
- Jun 22, 2025
- Proceedings of the ACM on Software Engineering
- Anshunkang Zhou + 2 more
Parallel fuzzing, which utilizes multicore computers to accelerate the fuzzing process, has been widely used in industrial-scale software defect detection. However, specifying efficient parallel fuzzing strategies for programs with different characteristics is challenging due to the difficulty of reasoning about fuzzing runtime statically. Existing efforts still use pre-defined tactics for various programs, resulting in suboptimal performance. In this paper, we propose KraKen, a new program-adaptive parallel fuzzer that improves fuzzing efficiency through dynamic strategy optimization. The key insight is that the inefficiency in parallel fuzzing can be observed during runtime through various feedbacks, such as code coverage changes, which allows us to adjust the adopted strategy to avoid inefficient path searching, thus gradually approximating the optimal policy. Based on the above insight, our key idea is to view the task of finding the optimal strategy as an optimization problem and gradually approach the best program-specific strategy on the fly by maximizing certain objective functions. We have implemented Kraken in C/C++ and evaluated it on 19 real-world programs against 6 state-of-the-art parallel fuzzers. Experimental results show that Kraken can achieve 54.7% more code coverage and find 70.2% more bugs in the given time. Moreover, Kraken has found 192 bugs in 37 popular open-source projects, and 119 of them are assigned with CVE IDs.