Runtime Performance Research Articles

Dynamic workload orchestration is one of the main concerns when working with heterogeneous computing infrastructures in the edge-cloud continuum. In this context, FPGA-based computing nodes can take advantage of their improved flexibility, performance and energy efficiency provided that they use proper resource management strategies. In this regard, many state-of-the-art systems rely on proactive power management techniques and task scheduling decisions, which in turn require deep knowledge about the applications to be accelerated and the actual response of the target reconfigurable fabrics when executing them. While acquiring this knowledge at design time was more or less feasible in the past, with applications mostly being static task graphs that did not change at run time, the highly dynamic nature of current workloads in the edge-cloud continuum, where tasks can be deployed on any node and at any time, has removed this possibility. As a result, being able to derive such information at run time to make informed decisions has become a must. This paper presents an infrastructure to build incremental ML models that can be used to obtain run-time power consumption and performance estimations in FPGA-based reconfigurable multi-accelerator systems operating under dynamic workloads. The proposed infrastructure features a novel stop-and-restart resource-aware mechanism to monitor and control the model training and evaluation stages during normal system operation, enabling low-overhead updates in the models to account for either unexpected acceleration requests (i.e., tasks not considered previously by the models) or model drift (e.g., fabric degradation). Experimental results show that the proposed approach induces a maximal additional error of 3.66% compared to a continuous training alternative. Furthermore, the proposed approach incurs only a 4.49% execution time overhead, compared to the 20.91% overhead induced by the continuous training alternative. The proposed modeling strategy enables innovative scheduling approaches in reconfigurable systems. This is exemplified by the conflict-aware scheduler introduced in this work, which achieves up to a 1.35 times speedup in executing the experimental workload. Additionally, the proposed approach demonstrates superior adaptability compared to other methods in the literature, particularly in response to significant changes in workload and to mitigate the effects of model overfitting. The portability of the proposed modeling methodology and monitoring infrastructure is also shown through their application to both Zynq-7000 and Zynq UltraScale+ devices.

Read full abstract

CompCert project, the state-of-the-art compiler that achieves the first end-to-end formally verified C compiler, does not support fully verified instruction scheduling. Instead, existing research that works on such topics only implements translation validation. This means they do not have direct formal proof that the scheduling algorithm is correct, but only a posterior validation to check each compiling case. Using such a method, CompCert accepts a valid C program and compiles correctly only when the untrusted scheduler generates a correct result. However, it does not guarantee the complete correctness of the scheduler. It also causes compile-time validation overhead in the view of runtime performance. In this work, we present the first achievement in developing a mechanized library for fully verified instruction scheduling while keeping the proof workload acceptably lightweight. The idea to reduce the proof length is to exploit a simple property that the topological reordering of a topological sorted list is equal to a sequence of swapping adjacent unordered elements. Together with the transitivity of semantic simulation relation, the only burden will become proving the semantic preservation of a transition that only swaps two adjacent independent instructions inside one block. After successfully proving this result, proving the correctness of any new instruction scheduling algorithm only requires proof that it preserved the syntax-level dependence among instructions, instead of reasoning about semantics details every time. We implemented a mechanized library of such methods in the Coq proof assistant based on CompCert's library as a framework and used the list scheduling algorithm as a case study to show the correctness can be formally proved using our theory. We show that with our method that abstracts away the semantics details, it is flexible to implement any scheduler that reorders instructions with little extra proof burden. Our scheduler in the case study also abstracts away the outside scheduling heuristic as a universal parameter so it is flexible to modify without touching any correctness proof.

Read full abstract

Runtime Performance Research Articles

Related Topics

Articles published on Runtime Performance

Leveraging Incremental Machine Learning for Reconfigurable Systems Modeling under Dynamic Workloads

RepSGG: Novel Representations of Entities and Relationships for Scene Graph Generation.

Efficient compiler optimization by modeling passes dependence

A Biochemistry-Inspired Algorithm for Path Planning in Unmanned Ground Vehicles

TargetCall: eliminating the wasted computation in basecalling via pre-basecalling filtering.

RT-APT: A real-time APT anomaly detection method for large-scale provenance graph

TriHuman: A Real-time and Controllable Tri-plane Representation for Detailed Human Geometry and Appearance Synthesis

Fully Verified Instruction Scheduling

Comparative analysis of lattice-based cryptographic schemes for secure IoT communications

Split-explicit external mode solver in the finite volume sea ice–ocean model FESOM2

FL-DSFA: Securing RPL-Based IoT Networks against Selective Forwarding Attacks Using Federated Learning.

Q-Seg: Quantum Annealing-Based Unsupervised Image Segmentation.

Approximating problems in abstract argumentation with graph convolutional networks

Longitudinal registration of T1-weighted breast MRI: A registration algorithm (FLIRE) and clinical application

Modeling Sequences as Star Graphs to Address Over-Smoothing in Self-Attentive Sequential Recommendation

Intelligent algorithm selection for efficient update predictions in social media feeds

Optimization Applications as Quantum Performance Benchmarks

Runtime performance of a GAMESS quantum chemistry application offloaded to GPUs

The Application of Big Data in the Management of Ideological and Political Education in the Development of Education Network

Efficient all-electron hybrid density functionals for atomistic simulations beyond 10 000 atoms.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Runtime Performance Research Articles

Related Topics

Articles published on Runtime Performance

Leveraging Incremental Machine Learning for Reconfigurable Systems Modeling under Dynamic Workloads

RepSGG: Novel Representations of Entities and Relationships for Scene Graph Generation.

Efficient compiler optimization by modeling passes dependence

A Biochemistry-Inspired Algorithm for Path Planning in Unmanned Ground Vehicles

TargetCall: eliminating the wasted computation in basecalling via pre-basecalling filtering.

RT-APT: A real-time APT anomaly detection method for large-scale provenance graph

TriHuman: A Real-time and Controllable Tri-plane Representation for Detailed Human Geometry and Appearance Synthesis

Fully Verified Instruction Scheduling

Comparative analysis of lattice-based cryptographic schemes for secure IoT communications

Split-explicit external mode solver in the finite volume sea ice–ocean model FESOM2

FL-DSFA: Securing RPL-Based IoT Networks against Selective Forwarding Attacks Using Federated Learning.

Q-Seg: Quantum Annealing-Based Unsupervised Image Segmentation.

Approximating problems in abstract argumentation with graph convolutional networks

Longitudinal registration of T1-weighted breast MRI: A registration algorithm (FLIRE) and clinical application

Modeling Sequences as Star Graphs to Address Over-Smoothing in Self-Attentive Sequential Recommendation

Intelligent algorithm selection for efficient update predictions in social media feeds

Optimization Applications as Quantum Performance Benchmarks

Runtime performance of a GAMESS quantum chemistry application offloaded to GPUs

The Application of Big Data in the Management of Ideological and Political Education in the Development of Education Network

Efficient all-electron hybrid density functionals for atomistic simulations beyond 10 000 atoms.