SparseAdapt: Runtime Control for Sparse Linear Algebra on a Reconfigurable Accelerator

Subhankar Pal,Siying Feng,Christophe Dubach,Aporva Amarnath,Ronald Dreslinski,Michael O'Boyle

doi:10.1145/3466752.3480134

Abstract

Dynamic adaptation is a post-silicon optimization technique that adapts the hardware to workload phases. However, current adaptive approaches are oblivious to implicit phases that arise from operating on irregular data, such as sparse linear algebra operations. Implicit phases are short-lived and do not exhibit consistent behavior throughout execution. This calls for a high-accuracy, low overhead runtime mechanism for adaptation at a fine granularity. Moreover, adopting such techniques for reconfigurable manycore hardware, such as coarse-grained reconfigurable architectures (CGRAs), adds complexity due to synchronization and resource contention. We propose a lightweight machine learning-based adaptive framework called SparseAdapt. It enables low-overhead control of configuration parameters to tailor the hardware to both implicit (data-driven) and explicit (code-driven) phase changes. SparseAdapt is implemented within the runtime of a recently-proposed CGRA called Transmuter, which has been shown to deliver high performance for irregular sparse operations. SparseAdapt can adapt configuration parameters such as resource sharing, cache capacities, prefetcher aggressiveness, and dynamic voltage-frequency scaling (DVFS). Moreover, it can operate under the constraints of either (i) high energy-efficiency (maximal GFLOPS/W), or (ii) high power-performance (maximal GFLOPS3/W). We evaluate SparseAdapt with sparse matrix-matrix and matrix-vector multiplication (SpMSpM and SpMSpV) routines across a suite of uniform random, power-law and real-world matrices, in addition to end-to-end evaluation on two graph algorithms. SparseAdapt achieves similar performance on SpMSpM as the largest static configuration, with 5.3× better energy-efficiency. Furthermore, on both performance and efficiency, SparseAdapt is at most within 13% of an Oracle that adapts the configuration of each phase with global knowledge of the entire program execution. Finally, SparseAdapt is able to outperform the state-of-the-art approach for runtime reconfiguration by up to 2.9× in terms of energy-efficiency.

Highlights

Sparse linear algebra operations are key components of a plethora of modern applications, from graph analytics to scientific computing and machine learning [4, 5, 9, 10, 23, 50, 55, 64, 65, 68,69,70,71]
In order to tackle these challenges, we propose an adaptive runtime framework, SparseAdapt, that reconfigures a coarse-grained reconfigurable architectures (CGRAs) to adapt to evolving phases in sparse computation kernels
We evaluated our proposed framework first against the Baseline, Max Cfg and Best Avg configurations, followed by upper-bound studies

Summary

Introduction

Sparse linear algebra operations are key components of a plethora of modern applications, from graph analytics to scientific computing and machine learning [4, 5, 9, 10, 23, 50, 55, 64, 65, 68,69,70,71]. Recent work have led to a myriad of proposals on optimizing sparse computation through fixed-function accelerator designs [3, 40, 57, 58, 72] While these demonstrate energy-efficiency improvements of the order of 100 of times over a GPU, there is an important trade-off in terms of loss of flexibility, i.e. such designs are only applicable to a few kernels. CGRAs incorporate word-granular operations to overcome the energy inefficiency of field-programmable gate arrays (FPGAs), while retaining programmability They allow for hardware reconfiguration at the granularity of the processing element (PE) array, network fabric, or the memory subsystem.

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SparseAdapt: Runtime Control for Sparse Linear Algebra on a Reconfigurable Accelerator

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Oct 17, 2021
Citations: 9	License type: cc-by

Similar Papers

A Framework for High Level Simulation and Optimization of Coarse-Grained Reconfigurable Architectures
Muhammad Adeel Pasha ... Bilal Siddiqui
-
Muhammad Adeel Pasha, et. al.Muhammad Adeel Pasha ... Bilal Siddiqui
01 Jan 2017
01 Jan 2017

Micro-architectural Enhancements in Distributed Memory CGRAs for LU and QR Factorizations
Farhad Merchant ... Nandhini Gopalan
-
Farhad Merchant, et. al.Farhad Merchant ... Nandhini Gopalan
01 Jan 2015
01 Jan 2015

Low power mapping optimization of loops for dual-Vdd CGRAs
Kaijian Yuan ... Xingming Zhang
-
Kaijian Yuan, et. al.Kaijian Yuan ... Xingming Zhang
01 Oct 2017
01 Oct 2017

TEA: Timing and Energy Aware compression architecture for Efficient Configuration in CGRAs
Syed M.A.H Jafri ... Juha Plosila
Microprocessors and Microsystems | VOL. 39
Syed M.A.H Jafri, et. al.Syed M.A.H Jafri ... Juha Plosila
23 May 2015
Microprocessors and Microsystems | VOL. 39

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SparseAdapt: Runtime Control for Sparse Linear Algebra on a Reconfigurable Accelerator

Abstract

Highlights

Summary

Talk to us

Similar Papers