FC-GPU: Feedback Control GPU Scheduling for Real-time Embedded Systems
GPUs have recently been adopted in many real-time embedded systems. However, existing GPU scheduling solutions are mostly open-loop and rely on the estimation of worst-case execution time (WCET). Although adaptive solutions, such as feedback control scheduling, have been previously proposed to handle this challenge for CPU-based real-time tasks, they cannot be directly applied to GPU, because GPUs have different and more complex architectures and so schedulable utilization bounds cannot apply to GPUs yet. In this article, we propose FC-GPU, the first Feedback Control GPU scheduling framework for real-time embedded systems. To model the GPU resource contention among tasks, we analytically derive a multi-input-multi-output (MIMO) system model that captures the impacts of task rate adaptation on the response times of different tasks. Building on this model, we design a MIMO controller that dynamically adjusts task rates based on measured response times. Our extensive hardware testbed results on an Nvidia RTX 3090 GPU and an AMD MI-100 GPU demonstrate that FC-GPU can provide better real-time performance even when the task execution times significantly increase at runtime.
- Conference Article
82
- 10.4230/oasics.wcet.2009.2291
- Jan 1, 2009
In this paper we present a measurement-based approach that produces both a WCET (Worst Case Execution Time) estimate, and a prediction of the probability that a future execution time will exceed our estimate. Our statistical-based approach uses extreme value theory to build a model of the tail behavior of the measured execution time value. We validate our approach using an industrial data set comprised of over 150 sampled components and nearly 200 million sample execution times. Each trace is divided into two segments, with one used to make the WCET estimate, and the second used check our prediction of the fraction of future execution time samples that exceed our WCET estimate. We show that compared to WCET estimates derived from the worst-case observed time, our WCET estimates significantly improve the ability to predict the probability that our WCET estimate is exceeded.
- Conference Article
19
- 10.1145/2516821.2516842
- Oct 16, 2013
Semiconductor technology evolution suggests that permanent failure rates will increase dramatically with scaling, in particular for SRAM cells. While well known approaches such as error correcting codes exist to recover from failures and provide fault-free chips, they will not be affordable anymore in the future due to their non-scalable cost. Consequently, other approaches like fine grain disabling and reconfiguration of hardware elements (e.g. individual functional units or cache blocks) will become economically necessary. This fine-grain disabling will lead to degraded performance compared to a fault-free execution.To the best of our knowledge, all static worst-case execution time (WCET) estimation methods assume fault-free architectures. Their result is not safe anymore when using fine grain disabling of hardware components, which degrades performance. In this paper we provide the first method that statically calculates a probabilistic WCET bound in the presence of permanent faults in instruction caches. The proposed method, from a given program, cache configuration and probability of cell failure, derives a probabilistic WCET bound. The proposed method, because it relies on static analysis, is guaranteed to identify the longest program path, its probabilistic nature only stemming from the presence of faults. The method is computationally tractable because it does not require an exhaustive enumeration of the possible locations of faulty cache blocks. Experimental results show that it provides WCET estimates very close to, but never below, the method that derives probabilistic WCETs by enumerating all possible locations of faulty cache blocks. The proposed method not only allows to quantify the impact of permanent faults on WCET estimates, but also can be used in architectural exploration frameworks to select the most appropriate fault management mechanisms.
- Research Article
12
- 10.1007/s11241-014-9212-x
- Nov 5, 2014
- Real-Time Systems
Semiconductor technology evolution suggests that permanent failure rates will increase dramatically with scaling, in particular for SRAM cells. While well known approaches such as error correcting codes exist to recover from failures and provide fault-free chips, they will not be affordable anymore in the future due to their growing cost. Consequently, other approaches like fine-grained disabling and reconfiguration of hardware elements (e.g. individual functional units or cache blocks) will become economically necessary. This fine-grained disabling will degrade performance compared to a fault-free execution. To the best of our knowledge, all static worst-case execution time (WCET) estimation methods assume fault-free processors. Their result is not safe anymore when fine-grained disabling of hardware components is used. In this paper we provide the first method that statically calculates a probabilistic WCET bound in the presence of permanent faults in instruction caches. The proposed method derives a probabilistic WCET bound for a program, cache configuration, and probability of cell failure. As our method relies on static analysis to bound the longest path, its probabilistic nature only stems from the probability that faults actually occur. Our method is computationally tractable because it does not require an exhaustive enumeration of all the possible combinations of faulty cache blocks. Experimental results show that it provides WCET estimates very close to, but never below, the method that derives probabilistic WCETs by enumerating all possible locations of faulty cache blocks. The proposed method not only allows to quantify the impact of permanent faults on WCET estimates, but, most importantly, can be used in architectural exploration frameworks to select the most appropriate fault management mechanisms and design parameters for current and future chip designs.
- Research Article
- 10.5075/epfl-thesis-8146
- Jan 1, 2017
Modern computing systems are based on multi-processor systems, i.e. multiple cores on the same chip. Hard real-time systems are required to perform particular tasks within certain amount of time; failure to do so characterises an unaccepted behavior. Hard real-time systems are found in safety-critical applications, e.g. airbag control software, flight control software, etc. In safety-critical applications, failure to meet the real-time constraints can have catastrophic effects. The safe and, at the same time, efficient deployment of applications, with hard real-time constraints, on multi-processors is a challenging task. Scheduling methods and Models of Computation, that provide safe deployments, require a realistic estimation of the Worst-Case Execution Time (WCET) of tasks. The simultaneous access of shared resources by parallel tasks, causes interference delays due to hardware arbitration. Interference delays can be accounted for, with the pessimistic assumption that all possible interference can happen. The resulting schedules would be exceedingly conservative, thus the benefits of multi-processor would be significantly negated. Producing less pessimistic schedules is challenging due to the inter-dependency between WCET estimation and deployment optimisation. Accurate estimation of interference delays -and thus estimation of task WCET- depends on the way an application is deployed; deployment is an optimisation problem that depends on the estimation of task WCET. Another efficiency gap, which is of consequence in several systems (e.g. airbag control), stems from the fact that rarely tasks execute with their WCET. Safe runtime adaptation based on the Actual Execution Times, can yield additional improvements in terms of latency (more responsive systems). To achieve efficiency and retain adaptability, we propose that interference analysis should be coupled with the deployment process. The proposed interference analysis method estimates the possible amount of interference, based on an architecture and an application model. As more information is provided, such as scheduling, memory mapping, etc, the per-task interference estimation becomes more accurate. Thus, the method computes interference-sensitive WCET estimations (isWCET). Based on the isWCET method, we propose a method to break the inter-dependency between WCET estimation and deployment optimisation. Initially, the isWCETs are over-approximated, by assuming worst-case interference, and a safe deployment is derived. Subsequently, the proposed method computes accurate isWCETs by spatio-temporal exclusion, i.e. excluding interferences from non-overlapping tasks that share resources (space). Based on accurate isWCETs, the deployment solution is improved to provide better latency guarantees. We also propose a distributed runtime adaptation technique, that aims to improve run-time latency. Using isWCET estimations restricts the possible adaptations, as an adaptation might increase the interference and violate the safety guarantees. The proposed technique introduces statically scheduling dependencies between tasks that prevent additional interference. At runtime, a self-timed scheduling policy that respects these dependencies, is applied, proven to be safe, and with minimal overhead. Experimental evaluation on Kalray MPPA-256 shows that our methods improve isWCET up to 36%, guaranteed latency up to 46%, runtime performance up to 42%, with a consolidated performance gain of 50%.
- Conference Article
3
- 10.1109/rtss46320.2019.00022
- Sep 13, 2019
Static Worst-Case Execution Time (WCET) estimation techniques operate upon the binary code of a program in order to provide the necessary input for schedulability analysis techniques. Compilers used to generate this binary code include tens of optimizations, that can radically change the flow information of the program. Such information is hard to be maintained across optimization passes and may render automatic extraction of important flow information, such as loop bounds, impossible. Thus, compiler optimizations, especially the sophisticated optimizations of mainstream compilers, are typically avoided. In this work, we explore for the first time iterative-compilation techniques that reconcile compiler optimizations and static WCET estimation. We propose a novel learning technique that selects sequences of optimizations that minimize the WCET estimate of a given program. We experimentally evaluate the proposed technique using an industrial WCET estimation tool (AbsInt aiT) over a set of 46 benchmarks from four different benchmarks suites, including reference WCET benchmark applications, image processing kernels and telecommunication applications. Experimental results show that WCET estimates are reduced on average by 20.3% using the proposed technique, as compared to the best compiler optimization level applicable.
- Conference Article
10
- 10.3850/9783981537079_0551
- Jan 1, 2016
Fine-grained disabling and reconfiguration of hardware elements (functional units, cache blocks) will become economically necessary to recover from permanent failures, whose rate is expected to increase dramatically in the near future. This fine-grained disabling will lead to degraded performance as compared to a fault-free execution. Until recently, all static worst-case execution time (WCET) estimations methods were assuming fault-free processors, resulting in unsafe estimates in the presence of faults. The first static WCET estimation technique dealing with the presence of permanent faults in instruction caches was proposed in [1]. This study probabilistically quantified the impact of permanent faults on WCET estimates. It demonstrated that the probabilistic WCET (pWCET) estimates of tasks increase rapidly with the probability of faults as compared to fault-free WCET estimates. In this paper, we show that very simple reliability mechanisms allow mitigating the impact of faulty cache blocks on pWCETs. Two mechanisms, that make part of the cache resilient to faults are analyzed. Experiments show that the gain in pWCET for these two mechanisms are on average 48% and 40% as compared to an architecture with no reliability mechanism.
- Conference Article
22
- 10.1109/ecrts.2013.33
- Jul 1, 2013
Existing timing analysis techniques to derive Worst-Case Execution Time (WCET) estimates assume that hardware in the target platform (e.g., the CPU) is fault-free. Given the performance requirements increase in current Critical Real-Time Embedded Systems (CRTES), the use of high-performance features and smaller transistors in current and future hardware becomes a must. The use of smaller transistors helps providing more performance while maintaining low energy budgets, however, hardware fault rates increase noticeably, affecting the temporal behaviour of the system in general, and WCET in particular. In this paper, we reconcile these two emergent needs of CRTES, namely, tight (and trustworthy) WCET estimates and the use of hardware implemented with smaller transistors. To that end we propose the Degraded Test Mode (DTM) that, in combination with fault-tolerant hardware designs and probabilistic timing analysis techniques, (i) enables the computation of tight and trustworthy WCET estimates in the presence of faults, (ii) provides graceful average and worst-case performance degradation due to faults, and (iii) requires modifications neither in WCET analysis tools nor in applications. Our results show that DTM allows accounting for the effect of faults at analysis time with low impact in WCET estimates and negligible hardware modifications.
- Research Article
- 10.4028/www.scientific.net/amm.651-653.624
- Sep 1, 2014
- Applied Mechanics and Materials
Most embedded systems are real-time systems, so real-time is an important performance metric for embedded systems. The worst-case execution time (WCET) estimation for embedded programs could satisfy the requirement of hard real-time evaluation, so it is widely used in embedded systems evaluation. Based on sufficient survey on the progress of WCET estimation around the world, it proposes a new classification of WCET estimation. After introducing the principle of WCET estimation, it mainly demonstrates various types of technologies to estimate WCET and classifies them into two main streams, namely, static and dynamic WCET estimations. Finally, it shows the development of WCET analysis tools.
- Conference Article
10
- 10.1109/rtas.2014.6926005
- Apr 1, 2014
Hard real-time systems are moving toward complex systems comprising chips with different IP components connected with standard buses. AMBA is one of the most used bus interfaces and has already been included in processors in the real-time domain. However, AMBA was not designed to provide time composable Worst Case Execution Time (WCET) estimates, which are desirable to reduce timing validation and verification costs. This paper analyzes and extends the AMBA Advanced High-performance Bus (AHB) specification to enable time-composable WCET estimates by design. Concretely, (1) we analyze in detail the AMBA AHB in the context of hard real-time systems proving that it fails to provide time composability; (2) we define a restricted subset of AMBA AHB features, named restricted AHB (resAHB), that allows deriving time-composable, yet not tight, WCET estimates; and (3) we define an extension of resAHB, named Advanced High-performance Real-time Bus (AHRB), that includes the timing constraints in the specification. This allows deriving time-composable and tight WCET estimates. Our results show that AHRB can provide 3.5x tighter estimates than resAHB on average for EEMBC benchmarks.
- Conference Article
4
- 10.1109/indin.2016.7819140
- Jul 1, 2016
Obtaining Worst-Case Execution Time (WCET) estimates is a required step in real-time embedded systems during software verification. Measurement-Based Probabilistic Timing Analysis (MBPTA) aims at obtaining WCET estimates for industrial-size software running upon hardware platforms comprising high-performance features. MBPTA relies on the randomization of timing behavior (functional behavior is left unchanged) of hard-to-predict events like the location of objects in memory - and hence their associated cache behavior - that significantly impact software's WCET estimates. Software time-randomized caches (sTRc) have been recently proposed to enable MBPTA on top of Commercial off-the-shelf (COTS) caches (e.g. modulo placement). However, some random events may challenge MBPTA reliability on top of sTRc. In this paper, for sTRc and programs with homogeneously accessed addresses, we determine whether the number of observations taken at analysis, as part of the normal MBPTA application process, captures the cache events significantly impacting execution time and WCET. If this is not the case, our techniques provide the user with the number of extra runs to perform to guarantee that cache events are captured for a reliable application of MBPTA. Our techniques are evaluated with synthetic benchmarks and an avionics application.
- Research Article
56
- 10.1145/3065924
- Jun 13, 2017
- ACM Transactions on Design Automation of Electronic Systems
Extreme Value Theory (EVT) has been historically used in domains such as finance and hydrology to model worst-case events (e.g., major stock market incidences). EVT takes as input a sample of the distribution of the variable to model and fits the tail of that sample to either the Generalised Extreme Value (GEV) or the Generalised Pareto Distribution (GPD). Recently, EVT has become popular in real-time systems to derive worst-case execution time (WCET) estimates of programs. However, the application of EVT is not straightforward and requires a detailed analysis of, and customisation for, the particular problem at hand. In this article, we tailor the application of EVT to timing analysis. To that end, (1) we analyse the response time of different hardware resources (e.g., cache memories) and identify those that may lead to radically different types of execution time distributions. (2) We show that one of these distributions, known as mixture distribution, causes problems in the use of EVT. In particular, mixture distributions challenge not only properly selecting GEV/GPD parameters (i.e., location, scale and shape) but also determining the size of the sample to ensure that enough tail values are passed to EVT and that only tail values are used by EVT to fit GEV/GPD. Failing to select these parameters has a negative impact on the quality of the derived WCET estimates. We tackle these problems, by (3) proposing Measurement-Based Probabilistic Timing Analysis using the Coefficient of Variation (MBPTA-CV), a new mixture-distribution aware, WCET-suited MBPTA method that builds on recent EVT developments in other fields (e.g., finance) to automatically select the distribution parameters that best fit the maxima of the observed execution times. Our results on a simulation environment and a real board show that MBPTA-CV produces high-quality WCET estimates.
- Conference Article
- 10.1109/rtss46320.2019.00055
- Dec 1, 2019
Probabilistic timing analysis techniques have been proposed for real-time systems to remedy the problems that deterministic estimates of the task's Worst-Case Execution Time and Worst-Case Response-Time can be both intractable and overly pessimistic. Often, assumptions are made that a task's response time and execution time probability distributions are independent of the other tasks. This assumption may not hold in real systems. In this paper, we analyze the timing behavior of a simple periodic task on a Raspberry Pi model 3 running Arch Linux ARM. In particular, we observe and analyze the distributions of wake-up latencies and execution times for the sequential jobs released by a simple periodic task. We observe that the timing behavior of jobs is affected by release events during the job's execution time, and of other processes running in between subsequent jobs of the periodic task. Using a data consistency approach we investigate whether it is reasonable to model the timing distribution of jobs affected by release events and intermediate processes as translations of the empirical timing distribution of non-affected jobs. According to the analysis, this paper shows that a translated distribution model of non-affected jobs is invalid for the execution time distribution of jobs affected by intermediate processes. Regarding the wake-up latency distribution with intermediate processes, a translated distribution model is improbable, but cannot be completely ruled out.
- Conference Article
4
- 10.4230/oasics.wcet.2006.675
- Jan 1, 2006
his paper examines the problem of determining bounds on execution time of real-time programs. Execution time estimation is generally useful in real-time software verification phase, but may be used in other phases of the design and execution of real-time programs (scheduling, automatic parallelizing, etc.). This paper is devoted to the worst-case execution time (WCET) analysis. We present a static WCET analysis approach aimed to automatically extract flow information used in WCET estimate computing. The approach combines symbolic execution and path enumeration. The main idea is to avoid unfolding loops performed by symbolic execution-based approaches while providing tight and safe WCET estimate.
- Research Article
2
- 10.1080/16843703.2017.1399971
- Nov 8, 2017
- Quality Technology & Quantitative Management
1. Problem Statement. Worst Case Execution Time (WCET) is the upper bound of the execution time of a real-time task running on a particular hardware platform. The estimation of WCET is of significant importance to the correctness of schedulability analysis and thus the reliability of real-time applications. 2. Approach. Traditionally, WCET is often estimated based on static analysis techniques. However, advanced architectural features make the static analysis of WCET extremely complicated. As a result, measurement-based approaches become popular in practice. Despite its popularity, this type of approach would require running a task with a large number of input settings to obtain safe and accurate WCET estimates. To solve this challenge, we develop statistical regression models on a set of benchmark tasks to connect task execution time with some selected features. Our models can be used to estimate the WCET of new tasks without running a large number of sample executions. 3. Results. Our fitted regression models show that the execution times are highly related to the selected feature factors ‘Load’ and ‘Missing Rate’. Through a comparison study, we evaluate different measurement-based approaches from both safe and accurate perspectives. The comparison results are summarized to give general guidelines in searching good measurement-based WCET estimation approaches.
- Conference Article
12
- 10.4230/oasics.wcet.2017.8
- Sep 19, 2017
Estimation of worst-case execution times (WCETs) is required to validate the temporal behavior of hard real time systems. Heptane is an open-source software program that estimates upper bounds of execution times on MIPS and ARM v7 architectures, offered to the WCET estimation community to experiment new WCET estimation techniques. The software architecture of Heptane was designed to be as modular and extensible as possible to facilitate the integration of new approaches. This paper is devoted to a description of Heptane, and includes information on the analyses it implements, how to use it and extend it.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.