Articles published on Worst-case execution time
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
524 Search results
Sort by Recency
- Research Article
- 10.7717/peerj-cs.3372
- Nov 18, 2025
- PeerJ Computer Science
- Shaya Alshaya + 3 more
The accurate and efficient simulation of biosensors is essential for applications in healthcare, environmental monitoring, and diagnostics. This study presents a co-simulation framework integrating COMSOL Multiphysics and Continuous DIscrete Simulation (CODIS+), enabling a synchronized and multi-domain simulation approach to enhance the accuracy and execution time estimation of biosensor systems. The proposed framework leverages COMSOL for high-fidelity multiphysics modeling of biosensor behavior and CODIS+ for real-time signal processing, incorporating a 1D Convolutional Neural Network (CNN) for advanced noise reduction. Furthermore, Worst-Case Execution Time (WCET) estimation is implemented to ensure predictable real-time performance, relying on profiling tools within SystemC and CODIS+. Unlike traditional standalone simulations, the proposed framework eliminates iterative feedback between control and physical modeling, optimizing computational efficiency while maintaining high detection accuracy. A high-fidelity COMSOL model is used as the reference for validation due to the absence of experimental data, ensuring a reliable benchmark for performance evaluation. The framework achieves a low Execution Time Error (ETE) of approximately 4%, validating the precision of execution time estimation and ensuring computational predictability. Performance evaluation is conducted using Root Mean Square Error (RMSE) and Signal-to-Noise Ratio (SNR) metrics. The proposed approach achieves a significant reduction in RMSE (from 7.8 to 2.1) and outperforms traditional noise reduction techniques in terms of SNR improvement, demonstrating its effectiveness in preserving biosensor signal integrity. These results confirm that integrating physics-based modeling with AI-driven noise filtering enhances both biosensor signal accuracy and real-time feasibility. The validation presented in this study is based solely on simulation and profiling results; hardware-level testing is planned for future work. The proposed co-simulation framework presents a scalable and reliable solution for optimizing biosensor design and real-time signal processing, ensuring its applicability in critical biomedical and environmental monitoring applications. It underscores the extensibility, modularity, and reusability of our integration approach, allowing other COMSOL models and CODIS+ functionalities to be easily incorporated and customized.
- Research Article
- 10.1145/3768344
- Nov 11, 2025
- ACM Transactions on Design Automation of Electronic Systems
- Miguel Alcon + 3 more
The provision of increasingly advanced autonomous software functionalities builds on cutting-edge autonomous driving frameworks to enable modular interactions among multiple software components. This approach helps to support functional cause-effect chains from multiple sensors to actuators. The complexity of the (software) component interactions makes it more difficult to ascertain the correctness of the timing behavior of the system. This is so because traditional timing-related metrics like worst-case execution and worst-case response time do not capture the inter-dependency in cause-effect chains between the input sampling time and the time at which computation based on those inputs is performed. Complementary timing-related metrics, such as maximum reaction time and maximum data age have been considered to capture timing requirements, typically with an end-to-end scope, in cause-effect chains. These metrics have been formalized and demonstrated in ROS2-based automotive and autonomous driving setups [ 44 , 46 ]. However, the formalization of those metrics, which is necessary for deriving analytical lower and upper bounds and monitoring them at run-time, largely depends on the execution model and semantics offered by the run-time. Any concrete application of those metrics need to be tailored and adapted to the system at hand. Apollo auto is a popular, industrial-quality, open-source autonomous driving framework that is seeing increasing adoption both for industrial and academic projects. Apollo builds on CyberRT , an ad-hoc run-time that is similar in mechanism and intent to ROS2 but differentiates from it with respect to execution model and supported semantics. In contrast to ROS2, CyberRT is highly specialized to support the Apollo AD framework, is neither extensively documented or thoroughly analysed in the literature, especially in relation to execution model and instantiation of timing-related metrics. In this work, for the first time, we provide an insightful analysis and discussion on CyberRT execution model and semantics, starting from its raw and non-extensively documented codebase. Based on the identified semantics, we elaborate a formalization of timing-related metrics on CyberRT , across different granularity scopes, namely end-to-end and node levels. In particular, we develop on the importance of node-level timing properties to intercept any latent timing misbehavior before it is too late, and it severely impacts end-to-end execution. We provide a concrete mapping of a comprehensive set of timing-related metrics to the CyberRT execution model, both at end-to-end and node level, and develop a monitoring library that allows to intercept them on the specific software stack. We exploit the proposed library on a set of Apollo autonomous driving scenarios to demonstrate its effectiveness in monitoring the considered timing metrics and to promptly intercept a subtle timing misbehavior beyond end-to-end execution scope in a representative autonomous driving stack.
- Research Article
- 10.1145/3761812
- Sep 26, 2025
- ACM Transactions on Embedded Computing Systems
- Srinivasan Subramaniyan + 1 more
GPUs have recently been adopted in many real-time embedded systems. However, existing GPU scheduling solutions are mostly open-loop and rely on the estimation of worst-case execution time (WCET). Although adaptive solutions, such as feedback control scheduling, have been previously proposed to handle this challenge for CPU-based real-time tasks, they cannot be directly applied to GPU, because GPUs have different and more complex architectures and so schedulable utilization bounds cannot apply to GPUs yet. In this article, we propose FC-GPU, the first Feedback Control GPU scheduling framework for real-time embedded systems. To model the GPU resource contention among tasks, we analytically derive a multi-input-multi-output (MIMO) system model that captures the impacts of task rate adaptation on the response times of different tasks. Building on this model, we design a MIMO controller that dynamically adjusts task rates based on measured response times. Our extensive hardware testbed results on an Nvidia RTX 3090 GPU and an AMD MI-100 GPU demonstrate that FC-GPU can provide better real-time performance even when the task execution times significantly increase at runtime.
- Research Article
- 10.1145/3761814
- Sep 26, 2025
- ACM Transactions on Embedded Computing Systems
- Abigail Eisenklam + 6 more
As multicore hardware becomes increasingly prevalent in real-time embedded systems, traditional scheduling techniques that assume a single worst-case execution time for each task are no longer adequate, as they fail to account for the impact of shared resources—such as cache and memory bandwidth—on execution time. When tasks execute concurrently on different cores, their execution times can vary substantially with their allocated resources. Moreover, the instruction rate of a task during a job execution varies with time, and this variation pattern differs across tasks. Therefore, to improve performance it is crucial to incorporate the relationship between the resource budget allocated to each task and its time-varying instruction rate in task modeling, resource allocation, and scheduling algorithm design. Yet, no prior work has considered the fine-grained dynamic resource allocation and scheduling problems jointly while also providing hard real-time guarantees. In this article, we introduce a resource-dependent multi-phase timing model that captures the time-varying instruction rates of a task under different resource allocations and that enables worst-case analysis under dynamic allocation. We present a method for constructing estimates of such a model based on task execution profiles, which can be obtained through measurements. We then present Rasco , a co-design technique for multicore resource allocation and scheduling of real-time DAG applications with end-to-end deadlines. Rasco leverages the resource-dependent multi-phase model of each task to simultaneously allocate resources at a fine granularity and assign task deadlines. This approach maximizes execution progress under resource constraints while providing hard real-time schedulability guarantees. Our evaluation shows that Rasco substantially enhances schedulability and reduces end-to-end latency compared to the state of the art.
- Research Article
- 10.48084/etasr.10990
- Jun 4, 2025
- Engineering, Technology & Applied Science Research
- Ammar Merazga + 5 more
Real-time systems need communication networks, as they often operate across multiple physical nodes. CAN-BUS is a common field bus used in these systems. Such systems require a time-based analysis to meet key deadlines and ensure system safety. This study designs and implements a distributed embedded motor control system using FreeRTOS over CAN-BUS for real-time operation. A prototype was built with low-cost components such as Arduino, L298N, and an MCP2515 module. A WCET analysis was performed on the system. The system has two CAN nodes connected to a PC via PCAN-USB for testing and analyzing using Busmaster software. The first CAN node controls the DC motor speed using a real-time PID controller, and the other manages the motor speed through CAN. The experimental results of the PID controller showed a low steady-state error of less than 0.3%. As the speed increases, there is less overshoot. The settling time ( ) is also short, proving the stability of the system. Πerformance was verified by comparing it with the signals from the Busmaster software. The WCET analysis used the Bound-T tool on the AVR2560 microprocessor (16 MHz without cache or pipeline). The calculated WCETs for three tasks were 1228.88 µs, 210.19 µs, and 2786.25 µs. This work verifies the schedulability of tasks for applications on a FreeRTOS real-time embedded platform. Bound-T is an open-source tool for WCET static analysis that has shown strong potential and can be used to perform precise and reliable temporal analyses.
- Research Article
- 10.14429/dsj.20751
- Mar 24, 2025
- Defence Science Journal
- Balakrishnan P + 2 more
Efficient task partitioning and scheduling on multicore processors are critical for optimizing performance and resource utilization in real-time systems. This paper explores a dynamic approach to task partitioning and scheduling, leveraging Intel Cache Allocation Technology (CAT) and pseudo-locking to enhance predictability and reduce inter-core interference. By dynamically allocating cache resources to critical tasks, partitioning high-frequency tasks into a separate cluster and isolating them from contention, the system achieves improved schedulability. Additionally, an adaptive Earliest Deadline First (EDF) scheduling algorithm is introduced, which allocates the tasks to free cores in real time based on workload variations and resource availability. The proposed techniques are validated through typical applications in signal processing and other similar systems, where high throughput, low latency, and strict timing constraints are paramount. Experimental results of the Modified-EDF approach demonstrated a reduction of 4.6 % in Worst-Case Execution Time (WCET) compared to SCHED_FIFO and a decrease of 2.3 % in CPU utilization Similarly, it achieved a 4.2 % improvement in WCET over SCHED_RR and a 2.3 % improvement over SCHED_DEADLINE., highlighting its efficiency gains through deadline sensitivity and cache-awareness, thus making this approach highly suitable for safety-critical and high-performance computing environments.
- Research Article
- 10.1007/s11241-025-09436-w
- Mar 1, 2025
- Real-Time Systems
- Shashank Jadhav + 1 more
Real-time embedded systems need to meet timing and energy constraints to avoid potential disasters. Compiler-level ScratchPad Memory (SPM) allocation can be used to optimize a program’s Worst-Case Execution Time (WCET) and energy consumption. However, static allocation is limited by SPM size constraints. Dynamic SPM allocation resolves this by allocating code to SPM during runtime, but copying code using the CPU increases WCET and energy consumption. To address this, we integrate a Direct Memory Access (DMA) model and DMA analysis at the compiler level and propose a single-objective DMA Call Placement Optimization (DCPO). In this paper, we consider functions and loops as dynamic allocation candidates. DCPO finds appropriate places within the code to place DMA transfer calls such that the DMA controller and the CPU run parallelly—minimizing the total execution time required by the DMA controller for dynamic allocation of functions and loops during runtime. Additionally, we propose a compiler-level DMA-aware multi-objective dynamic SPM allocation that uses DCPO and simultaneously minimizes WCET and energy objectives, yielding Pareto optimal solutions. Comparative evaluations demonstrate the superiority of our approach over state-of-the-art multi- and single-objective optimizations.
- Research Article
1
- 10.1017/cbp.2025.1
- Jan 30, 2025
- Research Directions: Cyber-Physical Systems
- Martin Schoeberl + 6 more
Abstract Real-time systems need to be built out of tasks for which the worst-case execution time is known. To enable accurate estimates of worst-case execution time, some researchers propose to build processors that simplify that analysis. These architectures are called precision-timed machines or time-predictable architectures. However, what does this term mean? This paper explores the meaning of time predictability and how it can be quantified. We show that time predictability is hard to quantify. Rather, the worst-case performance as the combination of a processor, a compiler, and a worst-case execution time analysis tool is an important property in the context of real-time systems. Note that the actual software has implications as well on the worst-case performance. We propose to define a standard set of benchmark programs that can be used to evaluate a time-predictable processor, a compiler, and a worst-case execution time analysis tool. We define worst-case performance as the geometric mean of worst-case execution time bounds on a standard set of benchmark programs.
- Research Article
- 10.1109/access.2025.3606480
- Jan 1, 2025
- IEEE Access
- Hayate Toba + 2 more
Autonomous driving systems have been extensively researched for safety. To ensure safety, a strict deadline is enforced before sensor data influences vehicle control. A common approach to this issue is Directed Acyclic Graph (DAG) scheduling. However, existing studies use fixed values, typically using Worst-case Execution Time (WCET). During actual vehicle operation, the task execution time varies due to factors such as input size (e.g., surrounding vehicles), and the actual execution time often takes on a value smaller than the WCET. The execution time variation causes the actual results of DAG scheduling to deviate from the intended strategy. To address this problem, this paper proposes a DAG scheduling algorithm that considers probabilistic execution time to reduce the number of deadline misses. For consecutive timer-driven nodes, a waiting time may occur, during which executing higher priority nodes does not affect end-to-end deadline misses. The proposed scheduling algorithm calculates this waiting time and executes high deadline miss rate nodes outside the end-to-end path during this time. In DAGs with high utilization, the evaluation results of the proposed scheduling algorithm demonstrate that the number of deadline misses decreases compared to existing methods. Specifically, our evaluation results indicate an approximate 8.3% improvement in deadline achievement rate compared to existing methods in high utilization scenarios.
- Research Article
1
- 10.1145/3695768
- Oct 5, 2024
- ACM Transactions on Embedded Computing Systems
- Thilo Leon Fischer + 1 more
The impact of preemptions has to be considered when determining the schedulability of a task set in a preemptively scheduled system. In particular, the contents of caches can be disturbed by a preemption, thus creating context-switching costs. These context-switching costs occur when a preempted task needs to reload data from memory after a preemption. The additional delay created by this effect is termed cache-related preemption delay (CRPD). The analysis of CRPD has been extensively studied for single-level caches in the past. However, for two-level caches, the analysis of CRPD is still an emerging area of research. In contrast to a single-level cache, which is only affected by direct preemption effects, the second-level cache in a two-level hierarchy can be subject to indirect interference after a preemption. Accesses that could be served from the L1 cache in the absence of preemptions, may be forwarded to the L2 cache, as the relevant data was evicted by a preemption. These accesses create the indirect interference in the L2 cache and can cause further evictions. Recently, a CRPD analysis for two-level non-inclusive cache hierarchies was proposed. In this article, we show that this state-of-the-art analysis is unsafe as it potentially underestimates the CRPD. Furthermore, we show that the analysis is pessimistic and can overestimate the indirect preemption effects. To address these issues, we propose a novel analysis approach for the CRPD in a two-level non-inclusive cache hierarchy. We prove the correctness of the presented approach based on the set of feasible program execution traces. We implemented the presented approach in a worst-case execution time (WCET) analysis tool and compared the performance to existing analysis methods. Our evaluation shows that the presented analysis increases task set schedulability by up to 14 percentage points compared with the state-of-the-art analysis.
- Research Article
- 10.1007/s11241-024-09430-8
- Sep 30, 2024
- Real-Time Systems
- Thilo L Fischer + 1 more
In multi-core architectures, the last-level cache (LLC) is often shared between cores. Sharing the LLC leads to inter-core interference, which impacts system performance and predictability. This means that tasks running in parallel on different cores may experience additional LLC misses as they compete for cache space. To compute a task’s worst-case execution time (WCET), a safe bound on the inter-core cache interference has to be determined. We propose an interference analysis for set-associative shared least-recently-used caches. The analysis leverages timing information to establish tight bounds on the worst-case interference and classifies individual accesses as either cache hits or potential cache misses. We evaluated the analysis performance for systems containing 2 and 4 cores using shared caches up to 64 KB. The evaluation shows an average WCET reduction of up to 23.3% for dual-core systems and 8.5% for quad-core systems.
- Research Article
- 10.1016/j.micpro.2024.105103
- Sep 19, 2024
- Microprocessors and Microsystems
- Francesco Cosimi + 3 more
SLOPE: Safety LOg PEripherals implementation and software drivers for a safe RISC-V microcontroller unit
- Research Article
- 10.1016/j.sysarc.2024.103266
- Aug 28, 2024
- Journal of Systems Architecture
- Markel Galarraga + 3 more
Modern transportation and industrial domain safety-critical applications, such as autonomous vehicles and collaborative robots, exhibit a combination of escalating software complexity and the need to integrate diverse software stacks and machine learning algorithms, consequently demanding complex high-performance hardware. Linux’s extensive platform support and library ecosystem make it a valuable general-purpose operating system for developing complex software systems. However, because the Linux kernel has not been designed to comply with safety standards, it has a high execution path variability and does not provide execution time guarantees. In this context, several research initiatives have studied the usage of Linux for developing complex safety-related systems, focusing on topics that include its development process, isolation architectures, or test coverage estimation. Nonetheless, execution-time analysis and providing temporal guarantees is still a challenge. This work extends the novel statistical analysis of Linux system call execution paths with the analysis of execution-time variability and proposes a method for estimating the worst-case execution time, forming a sound approach for an in-depth analysis of the Linux kernel execution paths and execution times for safety-related systems. The proposed method is applied to a representative use case that implements an Autonomous Emergency Brake application in an NVIDIA Jetson Nano board connected to the CARLA autonomous driving simulator.
- Research Article
3
- 10.3390/app14167277
- Aug 19, 2024
- Applied Sciences
- Meng Li + 3 more
To ensure the timely execution of hard real-time applications, scheduling analysis techniques must consider safe upper bounds on the possible execution durations of tasks or runnables, which are referred to as Worst-Case Execution Times (WCET). Bounding WCET requires not only program path analysis but also modeling the impact of micro-architectural features present in modern processors. In this paper, we model the ARMv8 ISA and micro-architecture including instruction cache, branch predictor, instruction prefetching strategies, out-of-order pipeline. We also consider the complex interactions between these features (e.g., cache misses caused by branch predictions and branch misses caused by instruction pipelines) and estimate the WCET of the program using the Implicit Path Enumeration Technique (IPET) static WCET analysis method. We compare the estimated WCET of benchmarks with the observed WCET on two ARMv8 boards. The ratio of estimated to observed WCET values for all benchmarks is greater than 1, demonstrating the security of the analysis.
- Research Article
- 10.1145/3656452
- Jun 20, 2024
- Proceedings of the ACM on Programming Languages
- Arjun Pitchanathan + 2 more
Compilers often use performance models to decide how to optimize code. This is often preferred over using hardware performance measurements, since hardware measurements can be expensive, limited by hardware availability, and makes the output of compilation non-deterministic. Analytical models, on the other hand, serve as efficient and noise-free performance indicators. Since many optimizations focus on improving memory performance, memory cache miss rate estimations can serve as an effective and noise-free performance indicator for superoptimizers, worst-case execution time analyses, manual program optimization, and many other performance-focused use cases. Existing methods to model the cache behavior of affine programs work on small programs such as those in the Polybench benchmark but do not scale to the larger programs we would like to optimize in production, which can be orders of magnitude bigger by lines of code. These analytical approaches hand of the whole program to a Presburger solver and perform expensive mathematical operations on the huge resulting formulas. We develop a scalable cache model for affine programs that splits the computation into smaller pieces that do not trigger the worst-case asymptotic behavior of these solvers. We evaluate our approach on 46 TorchVision neural networks, finding that our model has a geomean runtime of 44.9 seconds compared to over 32 minutes for the state-of-the-art prior cache model, and the latter is actually smaller than the true value because the prior model reached our four hour time limit on 54% of the networks, and this limit was never reached by our tool. Our model exploits parallelism effectively: running it on sixteen cores is 8.2x faster than running it single-threaded. While the state-of-the-art model takes over four hours to analyze a majority of the benchmark programs, Falcon produces results in at most 3 minutes and 3 seconds; moreover, after a local modification to the program being analyzed, our model efficiently updates the predictions in 513 ms on average (geomean). Thus, we provide the first scalable analytical cache model.
- Research Article
4
- 10.1007/s11241-024-09422-8
- Jun 17, 2024
- Real-Time Systems
- Alexander Zuepke + 4 more
In today’s multiprocessor systems-on-a-chip, the shared memory subsystem is a known source of temporal interference. The problem causes logically independent cores to affect each other’s performance, leading to pessimistic worst-case execution time analysis. Memory regulation via throttling is one of the most practical techniques to mitigate interference. Traditional regulation schemes rely on a combination of timer and performance counter interrupts to be delivered and processed on the same cores running real-time workload. Unfortunately, to prevent excessive overhead, regulation can only be enforced at a millisecond-scale granularity. In this work, we present a novel regulation mechanism from outside the cores that monitors performance counters for the application core’s activity in main memory at a microsecond scale. The approach is fully transparent to the applications on the cores, and can be implemented using widely available on-chip debug facilities. The presented mechanism also allows more complex composition of metrics to enact load-aware regulation. For instance, it allows redistributing unused bandwidth between cores while keeping the overall memory bandwidth of all cores below a given threshold. We implement our approach on a host of embedded platforms and conduct an in-depth evaluation on the Xilinx Zynq UltraScale+ ZCU102, NXP i.MX8M and NXP S32G2 platforms using the San Diego Vision Benchmark Suite.
- Research Article
1
- 10.1016/j.future.2024.06.015
- Jun 10, 2024
- Future Generation Computer Systems
- Jaewoo Lee + 1 more
IMC-PnG: Maximizing runtime performance and timing guarantee for imprecise mixed-criticality real-time scheduling
- Research Article
2
- 10.1016/j.sysarc.2024.103189
- Jun 7, 2024
- Journal of Systems Architecture
- Xuanliang Deng + 7 more
Partitioned scheduling with safety-performance trade-offs in stochastic conditional DAG models
- Research Article
6
- 10.1145/3617176
- Dec 21, 2023
- ACM Transactions on Software Engineering and Methodology
- Jaekwon Lee + 3 more
Weakly hard real-time systems can, to some degree, tolerate deadline misses, but their schedulability still needs to be analyzed to ensure their quality of service. Such analysis usually occurs at early design stages to provide implementation guidelines to engineers so they can make better design decisions. Estimating worst-case execution times (WCET) is a key input to schedulability analysis. However, early on during system design, estimating WCET values is challenging, and engineers usually determine them as plausible ranges based on their domain knowledge. Our approach aims at finding restricted, safe WCET sub-ranges given a set of ranges initially estimated by experts in the context of weakly hard real-time systems. To this end, we leverage (1) multi-objective search aiming at maximizing the violation of weakly hard constraints to find worst-case scheduling scenarios and (2) polynomial logistic regression to infer safe WCET ranges with a probabilistic interpretation. We evaluated our approach by applying it to an industrial system in the satellite domain and several realistic synthetic systems. The results indicate that our approach significantly outperforms a baseline relying on random search without learning and estimates safe WCET ranges with a high degree of confidence in practical time (< 23 h).
- Research Article
- 10.1080/17445760.2023.2293913
- Dec 19, 2023
- International Journal of Parallel, Emergent and Distributed Systems
- Arun S Nair + 5 more
CAMP proposes a hierarchical cache subsystem for multi-core mixed criticality processors, focusing on ensuring worst-case execution time (WCET) predictability in automotive applications. It incorporates criticality-aware locked L1 and L2 caches, reconfigurable at mode change intervals, along with criticality-aware last level cache partitioning. Evaluation using CACOSIM, Moola Multicore simulator, and CACTI simulation tools confirms the suitability of CAMP for keeping high-criticality jobs within timing budgets. A practical case study involving an automotive wake-up controller using the sniper v7.2 architecture simulator further validates its usability in real-world mixed criticality applications. CAMP presents a promising cache architecture for optimized multi-core mixed criticality systems.