Abstract

A key requirement for efficient general purpose approximate computing is an amalgamation of flexible hardware design and intelligent application tuning, which together can leverage the appropriate amount of approximation that the applications engender and reap the best efficiency gains from them. To achieve this, we have identified three important features to build better general-purpose cross-layer approximation systems: ① individual per-operation (“spatio-temporally fine-grained”) approximation, ② hardware-cognizant application tuning for approximation, ③ systemwide approximation-synergy. We build an efficient general purpose approximation system called SHASTA: Synergic HW-SW Architecture for Spatio-Temporal Approximation, to achieve these goals. 1 First, in terms of hardware, SHASTA approximates both compute and memory—SHASTA proposes (a) a form of timing approximation called Slack-control Approximation, which controls the computation timing of each approximation operation and (b) a Dynamic Pre-L1 Load Approximation mechanism to approximate loads prior to cache access. These hardware mechanisms are designed to achieve fine-grained spatio-temporally diverse approximation. Next, SHASTA proposes a Hardware-cognizant Approximation Tuning mechanism to tune an application’s approximation to achieve the optimum execution efficiency under the prescribed error tolerance. The tuning mechanism is implemented atop a gradient descent algorithm and, thus, the application’s approximation is tuned along the steepest error vs. execution efficiency gradient. Finally, SHASTA is designed with a full-system perspective, which achieves Synergic benefits across its optimizations, building a closer-to-ideal general purpose approximation system. SHASTA is implemented on top of an OOO core and achieves mean speedups/energy savings of 20%–40% over a non-approximate baseline for greater than 90% accuracy—these benefits are substantial for applications executing on a traditional general purpose processing system. SHASTA can be tuned to specific accuracy constraints and execution metrics and is quantitatively shown to achieve 2–15× higher benefits, in terms of performance and energy, compared to prior work.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call