Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster With 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode

Gianmarco Ottavi,Davide Rossi,Angelo Garofalo,Alfio Di Mauro,Giuseppe Tagliavini,Luca Benini,Francesco Conti

doi:10.1109/tcsi.2023.3254810

Gianmarco Ottavi, Davide Rossi + Show 5 more

Open Access

https://doi.org/10.1109/tcsi.2023.3254810

Copy DOI

Abstract

Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel algorithms on resource-constrained and battery-powered devices while retaining the flexibility granted by instruction processor-based architectures poses several challenges related to memory footprint, computational throughput, and energy efficiency. Low-bitwidth and mixed-precision arithmetic have been proven to be valid strategies for tackling these problems. We present Dustin, a fully programmable compute cluster integrating 16 RISC-V cores capable of 2-to 32-bit arithmetic and all possible mixed-precision combinations. In addition to a conventional Multiple-Instruction Multiple-Data (MIMD) processing paradigm, Dustin introduces a Vector Lockstep Execution Mode (VLEM) to minimize power consumption in highly data-parallel kernels. In VLEM, a single leader core fetches instructions and broadcasts them to the 15 follower cores. Clock gating Instruction Fetch (IF) stages and private caches of the follower cores leads to 38% power reduction. The cluster, implemented in 65 nm CMOS technology, achieves a peak performance of 58 GOPS and a peak efficiency of 1.15 TOPS/W.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster With 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems I: Regular Papers

Lead the way for us

Journal: IEEE Transactions on Circuits and Systems I: Regular Papers	Publication Date: Jun 1, 2023
Citations: 7

Similar Papers

Structured representation in deep neural network systems
Caiwen Ding
-
Caiwen DingCaiwen Ding
10 May 2021
10 May 2021

Re2PIM
Yilong Zhao ... Li Jiang
-
Yilong Zhao, et. al.Yilong Zhao ... Li Jiang
22 Jun 2021
22 Jun 2021

Analysis of a Pipelined Architecture for Sparse DNNs on Embedded Systems
Adrian Alcolea Moreno ... Hortensia Mecha
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 28
Adrian Alcolea Moreno, et. al.Adrian Alcolea Moreno ... Hortensia Mecha
08 Jul 2020
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 28

Energy-Efficient Deep Neural Network Optimization via Pooling-Based Input Masking
Jiankang Ren ... Huawei Lv
-
Jiankang Ren, et. al.Jiankang Ren ... Huawei Lv
18 Jul 2022
18 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster With 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems I: Regular Papers