OoO Core Research Articles

A key requirement for efficient general purpose approximate computing is an amalgamation of flexible hardware design and intelligent application tuning, which together can leverage the appropriate amount of approximation that the applications engender and reap the best efficiency gains from them. To achieve this, we have identified three important features to build better general-purpose cross-layer approximation systems: ① individual per-operation (“spatio-temporally fine-grained”) approximation, ② hardware-cognizant application tuning for approximation, ③ systemwide approximation-synergy. We build an efficient general purpose approximation system called SHASTA: Synergic HW-SW Architecture for Spatio-Temporal Approximation, to achieve these goals. 1 First, in terms of hardware, SHASTA approximates both compute and memory—SHASTA proposes (a) a form of timing approximation called Slack-control Approximation, which controls the computation timing of each approximation operation and (b) a Dynamic Pre-L1 Load Approximation mechanism to approximate loads prior to cache access. These hardware mechanisms are designed to achieve fine-grained spatio-temporally diverse approximation. Next, SHASTA proposes a Hardware-cognizant Approximation Tuning mechanism to tune an application’s approximation to achieve the optimum execution efficiency under the prescribed error tolerance. The tuning mechanism is implemented atop a gradient descent algorithm and, thus, the application’s approximation is tuned along the steepest error vs. execution efficiency gradient. Finally, SHASTA is designed with a full-system perspective, which achieves Synergic benefits across its optimizations, building a closer-to-ideal general purpose approximation system. SHASTA is implemented on top of an OOO core and achieves mean speedups/energy savings of 20%–40% over a non-approximate baseline for greater than 90% accuracy—these benefits are substantial for applications executing on a traditional general purpose processing system. SHASTA can be tuned to specific accuracy constraints and execution metrics and is quantitatively shown to achieve 2–15× higher benefits, in terms of performance and energy, compared to prior work.

Read full abstract

Modern microprocessor cores reach their high performance levels with the help of high clock rates, parallel and speculative execution of a large number of instructions, and vast cache hierarchies. Modern cores also have adaptive features to regulate power and temperature and avoid thermal emergencies. All of these features contribute to highly unpredictable execution times. In this article, we demonstrate that the execution time of in-order (IO), out-of-order (OoO), and OoO simultaneous multithreaded processors can be stable and predictable by stabilizing their mega instructions executed per second (MIPS) rate via a proportional, integral, and differential (PID) gain feedback controller and dynamic voltage and frequency scaling (DVFS). Processor cores in idle cycles are continuously consuming power, which is highly undesirable in systems, especially in real-time systems. In addition to meeting deadlines in real-time systems, our MIPS rate stabilization framework can be applied on top of it to reduce power and energy by avoiding idle cycles. If processors are equipped with MIPS rate stabilization, the execution time can be predicted. Because the MIPS rate remains steady, a stabilized processor meets deadlines on time in real-time systems or in systems with quality-of-service execution latency requirements at the lowest possible frequency. To demonstrate and evaluate this capability, we have selected a subset of the MiBench benchmarks with the widest execution rate variations. We stabilize their MIPS rate on a 1GHz Pentium III--like OoO single-thread microarchitecture, a 1.32GHz StrongARM-like IO microarchitecture, and the 1GHz OoO processor augmented with two-way and four-way simultaneous multithreading. Both IO and OoO cores can take advantage of the stabilization framework, but the energy per instruction of the stabilized OoO core is less because it runs at a lower frequency to meet the same deadlines. The MIPS rate stabilization of complex processors using a PID feedback control loop is a general technique applicable to environments in which lower power or energy coupled with steady, predictable performance are desirable, although we target more specifically real-time systems in this article.

Read full abstract

OoO Core Research Articles

Related Topics

Articles published on OoO Core

Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

GraphAttack

SHASTA

Genome Sequence Alignment - Design Space Exploration for Optimal Performance and Energy Architectures

Fast Data Delivery for Many-Core Processors

Nucleus

Increasing the efficiency and feasibility of configurable computing units

Analyzing Behavior Specialized Acceleration

Analyzing Behavior Specialized Acceleration

Efficient execution of memory access phases using dataflow specialization

Dynamic MIPS Rate Stabilization for Complex Processors

ZSim

Exploiting address compression and heterogeneous interconnects for efficient message management in tiled CMPs

Dynamic MIPS rate stabilization in out-of-order processors

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

OoO Core Research Articles

Related Topics

Articles published on OoO Core

Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

GraphAttack

SHASTA

Genome Sequence Alignment - Design Space Exploration for Optimal Performance and Energy Architectures

Fast Data Delivery for Many-Core Processors

Nucleus

Increasing the efficiency and feasibility of configurable computing units

Analyzing Behavior Specialized Acceleration

Analyzing Behavior Specialized Acceleration

Efficient execution of memory access phases using dataflow specialization

Dynamic MIPS Rate Stabilization for Complex Processors

ZSim

Exploiting address compression and heterogeneous interconnects for efficient message management in tiled CMPs

Dynamic MIPS rate stabilization in out-of-order processors