PIM GPT a hybrid process in memory accelerator for autoregressive transformers

Yuting Wu,Ziyu Wang,Wei D Lu

doi:10.1038/s44335-024-00004-2

Abstract

Decoder-only Transformer models such as Generative Pre-trained Transformers (GPT) have demonstrated exceptional performance in text generation by autoregressively predicting the next token. However, the efficiency of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. In this work, we propose a Process-in-memory (PIM) GPT accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication are supported by an application specific integrated chip (ASIC). At the software level, mapping schemes are designed to maximize data locality and computation parallelism. Overall, PIM-GPT achieves 41 − 137 × , 631 − 1074 × speedup and 123 − 383 × , 320 − 602 × energy efficiency over GPU and CPU baseline on 8 GPT models with up to 1.4 billion parameters.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: npj Unconventional Computing	Publication Date: Jul 25, 2024
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

PIM GPT a hybrid process in memory accelerator for autoregressive transformers

Abstract

Talk to us

Similar Papers

More From: npj Unconventional Computing

Lead the way for us

Similar Papers

Variants of secondary control with power recovery for loading hydraulic driving device
Wanguo Li ... Juan Chen
Chinese Journal of Mechanical Engineering | VOL. 28
Wanguo Li, et. al.Wanguo Li ... Juan Chen
30 Apr 2015
Chinese Journal of Mechanical Engineering | VOL. 28

Structured representation in deep neural network systems
Caiwen Ding
-
Caiwen DingCaiwen Ding
10 May 2021
10 May 2021

Development of parallel-connected pump–valve-coordinated control unit with improved performance and efficiency
Litong Lyu ... Bin Yao
Mechatronics | VOL. 70
Litong Lyu, et. al.Litong Lyu ... Bin Yao
26 Aug 2020
Mechatronics | VOL. 70

Energy E fficiency Oriented Full Duplex Wireless Communication Systems

-

08 Nov 2017
08 Nov 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PIM GPT a hybrid process in memory accelerator for autoregressive transformers

Abstract

Talk to us

Similar Papers

More From: npj Unconventional Computing