Architecting a Flash-Based Storage System for Low-Cost Inference of Extreme-Scale DNNs

Yunho Jin,Shine Kim,Tae Jun Ham,Jae W Lee

doi:10.1109/tc.2022.3209920

Abstract

The size of deep neural network (DNN) models has been exploding rapidly, demanding a colossal amount of memory capacity. For example, Google has recently scaled its Switch Transformer to have a parameter size of up to 6.4 TB. However, today's HBM DRAM-based memory system for GPUs and DNN accelerators is suboptimal for these extreme-scale DNNs as it fails to provide enough capacity while its massive bandwidth is poorly utilized. Thus, we propose <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Leviathan</i> , a DNN inference accelerator, which integrates a cost-effective <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">flash-based</i> storage system, instead. We carefully architect the storage system to provide enough memory bandwidth while preventing performance drop caused by read disturbance errors. Our evaluation of Leviathan demonstrates an 8.3× throughput gain compared to the iso-FLOPS DNN accelerator with conventional SSDs and up to 19.5× higher memory cost-efficiency than the HBM-based DNN accelerator.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Architecting a Flash-Based Storage System for Low-Cost Inference of Extreme-Scale DNNs

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computers

Lead the way for us

Journal: IEEE Transactions on Computers	Publication Date: Jan 1, 2022
Citations: 1

Similar Papers

A Reconfigurable Deep Neural Network on Chip Design with Flexible Convolutional Operations
Kun-Chih Chen ... Yi-Sheng Liao
-
Kun-Chih Chen, et. al.Kun-Chih Chen ... Yi-Sheng Liao
02 Oct 2022
02 Oct 2022

Dynamic Mapping Mechanism to Compute DNN Models on a Resource-limited NoC Platform
Kun-Chih Jimmy Chen ... Jing-Wen Liang
-
Kun-Chih Jimmy Chen, et. al.Kun-Chih Jimmy Chen ... Jing-Wen Liang
19 Apr 2021
19 Apr 2021

An Error Compensation Technique for Low-Voltage DNN Accelerators
Daehan Ji ... Jongsun Park
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 29
Daehan Ji, et. al.Daehan Ji ... Jongsun Park
15 Dec 2020
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 29

Joint Protection Scheme for Deep Neural Network Hardware Accelerators and Models
Jingbo Zhou ... Xinmiao Zhang
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 42
Jingbo Zhou, et. al.Jingbo Zhou ... Xinmiao Zhang
01 Dec 2023
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Architecting a Flash-Based Storage System for Low-Cost Inference of Extreme-Scale DNNs

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computers