Cache Interference-aware Task Partitioning for Non-preemptive Real-time Multi-core Systems

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Shared caches in multi-core processors introduce serious difficulties in providing guarantees on the real-time properties of embedded software due to the interaction and the resulting contention in the shared caches. Prior work has studied the schedulability analysis of global scheduling for real-time multi-core systems with shared caches. This article considers another common scheduling paradigm: partitioned scheduling in the presence of shared cache interference. To achieve this, we propose CITTA, a cache interference-aware task partitioning algorithm. We first analyze the shared cache interference between two programs for set-associative instruction and data caches. Then, an integer programming formulation is constructed to calculate the upper bound on cache interference exhibited by a task, which is required by CITTA. We conduct schedulability analysis of CITTA and formally prove its correctness. A set of experiments is performed to evaluate the schedulability performance of CITTA against global EDF scheduling and other greedy partition approaches such as First-fit and Worst-fit over randomly generated tasksets and realistic workloads in embedded systems. Our empirical evaluations show that CITTA outperforms global EDF scheduling and greedy partition approaches in terms of task sets deemed schedulable.

Similar Papers
  • Conference Article
  • Cite Count Icon 3
  • 10.1145/3372799.3394367
CITTA
  • Jun 16, 2020
  • Jun Xiao + 1 more

Shared caches in multi-core processors introduce serious difficulties in providing guarantees on the real-time properties of embedded software due to the interaction and the resulting contention in the shared caches. Prior work has studied the schedulability analysis of global scheduling for real-time multi-core systems with shared caches. This paper considers another common scheduling paradigm: partitioned scheduling in the presence of shared cache interference. To achieve this, we propose CITTA, a cache-interference aware task partitioning algorithm. An integer programming formulation is constructed to calculate the upper bound on cache interference exhibited by a task, which is required by CITTA. We conduct schedulability analysis of CITTA and formally prove its correctness. A set of experiments is performed to evaluate the schedulability performance of CITTA against global EDF scheduling over randomly generated tasksets. Our empirical evaluations show that CITTA outperforms global EDF scheduling in terms of task sets deemed schedulable.

  • Research Article
  • Cite Count Icon 13
  • 10.1109/tc.2020.2974224
Schedulability Analysis of Global Scheduling for Multicore Systems With Shared Caches
  • Oct 1, 2020
  • IEEE Transactions on Computers
  • Jun Xiao + 2 more

Shared caches in multicore processors introduce serious difficulties in providing guarantees on the real-time properties of embedded software due to the interaction and the resulting contention in the shared caches. To address this problem, we develop a new schedulability analysis for real-time multicore systems with shared caches, globally scheduled by Earliest Deadline First (EDF) and Fixed Priority (FP) algorithms. We construct an integer programming formulation, which can be transformed to an integer linear programming formulation, to calculate an upper bound on cache interference exhibited by a task within a given execution window. Using the integer programming formulation, an iterative algorithm is presented to obtain the upper bound on cache interference a task may exhibit during one job execution. The upper bound on cache interference is subsequently integrated into the schedulability analysis to derive a new schedulability condition. A range of experiments is performed to investigate how the schedulability is degraded by shared cache interference. We also evaluate the schedulability performance of EDF against FP scheduling over randomly generated tasksets. Our empirical evaluations show that EDF is better than FP scheduling in terms of the number of task sets deemed schedulable.

  • Conference Article
  • Cite Count Icon 25
  • 10.1109/rtss.2017.00026
Schedulability Analysis of Non-preemptive Real-Time Scheduling for Multicore Processors with Shared Caches
  • Dec 1, 2017
  • Jun Xiao + 2 more

Shared caches in multicore processors introduce serious difficulties in providing guarantees on the real-time properties of embedded software due to the interaction and the resulting contention in the shared caches. To address this problem, we develop a new schedulability analysis for real-time multicore systems with shared caches. To the best of our knowledge, this is the first work that addresses the schedulability problem with inter-core cache interference. We construct an integer programming formulation, which can be transformed to an integer linear programming formulation, to calculate an upper bound on cache interference exhibited by a task within a given execution window. Using the integer programming formulation, an iterative algorithm is presented to obtain the upper bound on cache interference a task may exhibit during one job execution. The upper bound on cache interference is subsequently integrated into the schedulability analysis to derive a new schedulability condition. A range of experiments is performed to investigate how the schedulability is degraded by shared cache interference.

  • Research Article
  • Cite Count Icon 12
  • 10.5626/jcse.2013.7.1.67
Multicore Real-Time Scheduling to Reduce Inter-Thread Cache Interferences
  • Mar 30, 2013
  • Journal of Computing Science and Engineering
  • Yiqiang Ding + 1 more

The worst-case execution time (WCET) of each real-time task in multicore processors with shared caches can be significantly affected by inter-thread cache interferences. The worst-case inter-thread cache interferences are dependent on how tasks are scheduled to run on different cores. Therefore, there is a circular dependence between real-time task scheduling, the worst-case inter-thread cache interferences, and WCET in multicore processors, which is not the case for single-core processors. To address this challenging problem, we present an offline real-time scheduling approach for multicore processors by considering the worst-case inter-thread interferences on shared L2 caches. Our scheduling approach uses a greedy heuristic to generate safe schedules while minimizing the worst-case inter-thread shared L2 cache interferences and WCET. The experimental results demonstrate that the proposed approach can reduce the utilization of the resulting schedule by about 12% on average compared to the cyclic multicore scheduling approaches in our theoretical model. Our evaluation indicates that the enhanced scheduling approach is more likely to generate feasible and safe schedules with stricter timing constraints in multicore real-time systems.

  • Research Article
  • Cite Count Icon 3
  • 10.1109/tcad.2018.2857081
Predictability and Performance Aware Replacement Policy PVISAM for Unified Shared Caches in Real-time Multicores
  • Nov 1, 2018
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
  • Mohammad Shihabul Haque + 1 more

Missing the deadline of an application task can be catastrophic in real-time systems. Therefore, to ensure timely completion of tasks, offline worst-case execution time and schedulability analysis is often performed for such real-time systems. One of the important inputs to this analysis is a safe upper bound of misses in each processor cache memory used by the system. Cache miss prediction techniques have matured significantly for private caches in single-core processors; however, remained as a challenge for unified, shared caches in multicore processors. According to prior studies, a task's miss upper bound on a shared cache can be predicted using available private cache prediction techniques only if the shared cache maintains core-based independent static partitions. The problem is, such partitions require the use of infeasible 'write-update consistency protocol' and wastes valuable cache space by duplicate caching. In this regard, this paper presents a novel cache replacement policy called 'predictable variable isolation in shared antipodal memory (PVISAM).' Its replacement decisions generate virtual core-based partitions that support demand-based runtime size adjustment and line sharing to better utilize space. Moreover, these partitions require no consistency protocol. Trace-driven experimental results for Parsec benchmark applications reveal that performance of a unified shared cache memory improves by 101.68 × on average (minimum 1.09× and maximum 1138.50 × ) when PVISAM is used instead of either the aforementioned write-update protocol-based predictable partitioning or the widely used write-invalidate consistency protocol-based partitioning. PVISAM can improve cache performance by 0.74 × on average (minimum 0.02 × and maximum 1.12 × ) compared to having no partitions at all. Both predictable partitioning and PVISAM improve unified, shared cache predictability by 63.44% (minimum 26.89% and maximum 99.99%) and 19.36% (minimum 1.58% and maximum 72.51%) on average compared to no partitions and write-invalidate protocol-based partitioning, respectively. Experimental results for synthetic traces show that PVISAM remarkably improves cache performance and predictability when compared to its three competitors even in scenarios that stress the cache.

  • Research Article
  • Cite Count Icon 2
  • 10.1002/ecj.11974
A Method of Shared File Cache for File Clone Function to Improve I/O Performance for Virtual Machines
  • Jun 19, 2017
  • Electronics and Communications in Japan
  • Hitoshi Kamei + 2 more

SUMMARYWe propose a method for shared file cache function for cloned files used by virtual machines, called SCC. The file clone function copies the files faster than conventional (read and write) method. Moreover, the function reduces disk spaces. The function is used to deploy virtual machines in virtual desktop infrastructure because of fast copying a lot of virtual machine disk files. SCC uses the file cache of a shared file as a shared cache among cloned files. The cached data on the shared file are returned to application programs on accessing the shared file via cloned files. Therefore, SCC improves the I/O performance of the shared file due to avoiding disk accesses. In this paper, we implement SCC and evaluate the I/O performance. From the evaluation, we have found SCC improves I/O throughput about 38 times in the case of random read and shared cache hit.

  • Conference Article
  • Cite Count Icon 124
  • 10.1109/ecrts.2013.19
A Coordinated Approach for Practical OS-Level Cache Management in Multi-core Real-Time Systems
  • Jul 1, 2013
  • Hyoseung Kim + 2 more

Many modern multi-core processors sport a large shared cache with the primary goal of enhancing the statistic performance of computing workloads. However, due to resulting cache interference among tasks, the uncontrolled use of such a shared cache can significantly hamper the predictability and analyzability of multi-core real-time systems. Software cache partitioning has been considered as an attractive approach to address this issue because it does not require any hardware support beyond that available on many modern processors. However, the state-of-the-art software cache partitioning techniques face two challenges: (1) the memory co-partitioning problem, which results in page swapping or waste of memory, and (2) the availability of a limited number of cache partitions, which causes degraded performance. These are major impediments to the practical adoption of software cache partitioning. In this paper, we propose a practical OS-level cache management scheme for multi-core real-time systems. Our scheme provides predictable cache performance, addresses the aforementioned problems of existing software cache partitioning, and efficiently allocates cache partitions to schedule a given task set. We have implemented and evaluated our scheme in Linux/RK running on the Intel Core i7 quad-core processor. Experimental results indicate that, compared to the traditional approaches, our scheme is up to 39% more memory space efficient and consumes up to 25% less cache partitions while maintaining cache predictability. Our scheme also yields a significant utilization benefit that increases with the number of tasks.

  • Research Article
  • Cite Count Icon 1
  • 10.1142/s0218126616500626
Dynamic Partitioned Cache Memory for Real-Time MPSoCs with Mixed Criticality
  • Mar 31, 2016
  • Journal of Circuits, Systems and Computers
  • Gang Chen + 4 more

Shared cache interference in multi-core architectures has been recognized as one of major factors that degrade predictability of a mixed-critical real-time system. Due to the unpredictable cache interference, the behavior of shared cache is hard to predict and analyze statically in multi-core architectures executing mixed-critical tasks, which will not only result in difficulty of estimating the worst-case execution time (WCET) but also introduce significant worst-case timing penalties for critical tasks. Therefore, cache management in mixed-critical multi-core systems has become a challenging task. In this paper, we present a dynamic partitioned cache memory for mixed-critical real-time multi-core systems. In this architecture, critical tasks can dynamically allocate and release the cache resourse during the execution interval according to the real-time workload. This dynamic partitioned cache can, on the one hand, provide the predicable cache performance for critical tasks. On the other hand, the released cache can be dynamically used by non-critical tasks to improve their average performance. We demonstrate and prototype our system design on the embedded FPGA platform. Measurements from the prototype clearly demonstrate the benefits of the dynamic partitioned cache for mixed-critical real-time multi-core systems.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.sysarc.2013.07.004
Cache isolation for virtualization of mixed general-purpose and real-time systems
  • Jul 18, 2013
  • Journal of Systems Architecture
  • Ruhui Ma + 4 more

Cache isolation for virtualization of mixed general-purpose and real-time systems

  • Research Article
  • 10.14569/ijacsa.2014.050920
Reducing Shared Cache Misses via dynamic Grouping and Scheduling on Multicores
  • Jan 1, 2014
  • International Journal of Advanced Computer Science and Applications
  • Wael Amr + 2 more

Multicore technology enables the system to perform more tasks with higher overall system performance.However, this performance can’t be exploited well due to the high miss rate in the second level shared cache among the cores which represents one of the multicore’s challenges. This paper addresses the dynamic co-scheduling of tasks in multicore real-time systems. The focus is on the basic idea of the megatask technique for grouping the tasks that may affect the shared cache miss rate ,and the Pfair scheduling that is then used for reducing the concurrency within the grouped tasks while ensuring the real time constrains. Consequently the shared cache miss rate is reduced.The dynamic co-scheduling is proposed through the combination of the symbiotic technique with the megatask technique for co-scheduling the tasks based on the collected information using two schemes. The first scheme is measuring the temporal working set size of each running task at run time, while the second scheme is collecting the shared cache miss rate of each running task at run time. Experiments show that the proposed dynamic coscheduling can decrease the shared cache miss rate compared to the static one by 52%.This indicates that the dynamic coscheduling is important to achieve high performance with shared cache memory for running high workloads like multimedia applications that require real-time response and continuousmedia data types.

  • Conference Article
  • Cite Count Icon 23
  • 10.1109/emwrts.1997.613764
Hybrid instruction cache partitioning for preemptive real-time systems
  • Jun 11, 1997
  • J.V Busquets-Mataix + 2 more

Cache memories have been historically avoided in real-time systems because of their unpredictable behavior. In addition to the research focused at obtaining the worst-case execution time of cached programs (typically assuming no preemption), some techniques have been presented to deal with the cache interference due to preemptions (extrinsic or inter-task cache interference). These techniques either account for the extrinsic (cache) interference in the schedulability analysis, or annuls it by partitioning the cache. This paper describes a new technique, hybrid partitioning, which is a mixture of the former two. It either provides a task with a private partition or accounts for the extrinsic interference that may arise. The hybrid technique outperforms the original two for any workload or hardware configuration. In conclusion, it represents a powerful yet general framework for dealing with extrinsic cache interference.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/ises52644.2021.00021
Timing Analysis in Multi-Core Real Time Systems
  • Dec 1, 2021
  • Preeti Godabole + 1 more

The motive of using multicore processors in real time systems, is due to power requirement, processing speed and a heavy integration trend. Failing to meet the timing constraints in real time systems can lead to catastrophic results. So, every task in a real-time system has to be mapped and scheduled to meet the deadlines and utilize the system resources efficiently. Different measurable parameters, in each of the priority based scheduling mechanisms and type of application is identified. The simulation of varied time critical system is carried out for timing analysis using priority based approaches of task scheduling. The extensive experimentation of log uniform random task sets, indicates that there is a necessity of a multi objective task scheduling mechanism for real time systems. The partitioned approach of scheduling in priority based schedulers is desirable to achieve reliability in multicore real time systems.

  • Conference Article
  • Cite Count Icon 6
  • 10.1145/3575757.3593643
Analysis of Shared Cache Interference in Multi-Core Systems using Event-Arrival Curves
  • Jun 7, 2023
  • Thilo L Fischer + 1 more

Caches are used to bridge the gap between main memory and the significantly faster processor cores. In multi-core architectures, the last-level cache is often shared between cores. However, sharing a cache causes inter-core interference to emerge. Concurrently running tasks will experience additional cache misses as the competing tasks issue interfering accesses and trigger the eviction of data contained in the shared cache. Thus, to compute a task’s worst-case execution time (WCET), a safe bound on the effects of inter-core cache interference has to be determined. In this paper, we propose a novel analysis approach for shared caches using the least recently used (LRU) replacement policy. The presented analysis leverages timing information to produce tight bounds on the worst-case interference. We describe how inter-core cache interference may be expressed as a function of time using event-arrival curves. Thus, by determining the maximal duration between subsequent accesses to a cache block, it is possible to bound the inter-core interference. This enables us to classify accesses as cache hits or potential misses. We implemented the analysis in a WCET analyzer and evaluated its performance for multi-core systems containing 2, 4, and 8 cores using shared caches from 4 KB to 32 KB. The analysis achieves significant improvements compared to a standard interference analysis with WCET reductions of up to 60%. The average WCET reduction is 9% for dual-core, 15% for quad-core, and 11% for octa-core systems. The analysis runtime overhead ranges from a factor of 4 × to 7 × compared to the baseline analysis.

  • Research Article
  • Cite Count Icon 3
  • 10.1007/s10766-016-0443-4
Priority Based Yield of Shared Cache to Provide Cache QoS in Multicore Systems
  • Jul 9, 2016
  • International Journal of Parallel Programming
  • Krupa Sivakumaran + 1 more

In multicore systems with shared cache, multiple tasks run on multiple cores simultaneously and compete for the shared cache. Cache interference occurs when a task running on one core replaces cache data belonging to other tasks running on other cores. With today's multicore systems running tasks with different priorities, the need for providing QoS guarantees on cache usage is gaining importance. Solutions to reduce cache interference and provide cache QoS mainly used a cache partitioning approach to split the cache among different cores. The solutions were implemented and validated only on simulators and not on real systems. This paper discusses new techniques that are used on real systems to (1) experimentally measure the amount of interference caused by multiple coscheduled programs, (2) reduce interference miss rate of some programs at the expense of others and (3) provide cache QoS guarantees to programs and ensure their miss rates remain below a ceiling.

  • Research Article
  • 10.1145/3786342
Global Scheduling of Weakly-Hard Real-Time Tasks using Job-Level Priority Classes
  • Mar 2, 2026
  • ACM Transactions on Embedded Computing Systems
  • Victor Gabriel Moyano + 3 more

Real-time systems are intrinsic components of many pivotal applications, such as self-driving vehicles, aerospace and defense systems. The trend in these applications is to incorporate multiple tasks onto fewer, more powerful hardware platforms, e.g., multi-core systems, mainly for reducing cost and power consumption. Many real-time tasks, like control tasks, can tolerate occasional deadline misses due to robust algorithms. These tasks can be modeled using the weakly-hard model. Literature shows that leveraging the weakly-hard model can relax the over-provisioning associated with designed real-time systems. However, a wide-range of the research focuses on single-core platforms. Therefore, we strive to extend the state-of-the-art of scheduling weakly-hard real-time tasks to multi-core platforms. We present a global job-level fixed priority scheduling algorithm together with its schedulability analysis. The scheduling algorithm leverages the tolerable continuous deadline misses to assigning priorities to jobs. The proposed analysis extends the Response Time Analysis (RTA) for global scheduling to test the schedulability of tasks. Hence, our analysis scales with the number of tasks and number of cores because, unlike literature, it depends neither on Integer Linear Programming nor reachability trees. Schedulability analyses show that the schedulability ratio is improved by 40% comparing to the global Rate Monotonic (RM) scheduling and up to 60% more than the global EDF scheduling, which are the state-of-the-art schedulers on the RTEMS real-time operating system. Our evaluation on industrial embedded multi-core platform running RTEMS shows that the scheduling overhead of our proposal does not exceed 60 nanosecond.

Save Icon
Up Arrow
Open/Close