Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar: A Synergy of Large and Small.

Zhehui Wang,Tao Luo,Cheng Liu,Weichen Liu,Rick Siow Mong Goh,Weng-Fai Wong

doi:10.1109/tpami.2024.3483654

Abstract

Large language models (LLMs) have garnered substantial attention due to their promising applications in diverse domains. Nevertheless, the increasing size of LLMs comes with a significant surge in the computational requirements for training and deployment. Memristor crossbars have emerged as a promising solution, which demonstrated a small footprint and remarkably high energy efficiency in computer vision (CV) models. Memristors possess higher density compared to conventional memory technologies, making them highly suitable for effectively managing the extreme model size associated with LLMs. However, deploying LLMs on memristor crossbars faces three major challenges. Firstly, the size of LLMs increases rapidly, already surpassing the capabilities of state-of-the-art memristor chips. Secondly, LLMs often incorporate multi-head attention blocks, which involve non-weight stationary multiplications that traditional memristor crossbars cannot support. Third, while memristor crossbars excel at performing linear operations, they are not capable of executing complex nonlinear operations in LLM such as softmax and layer normalization. To address these challenges, we present a novel architecture for the memristor crossbar that enables the deployment of state-of-the-art LLM on a single chip or package, eliminating the energy and time inefficiencies associated with off-chip communication. Our testing on BERT Large showed negligible accuracy loss. Compared to traditional memristor crossbars, our architecture achieves enhancements of up to 39× in area overhead and 18× in energy consumption. Compared to modern TPU/GPU systems, our architecture demonstrates at least a 68× reduction in the area-delay product and a significant 69% energy consumption reduction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar: A Synergy of Large and Small.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on pattern analysis and machine intelligence

Lead the way for us

Similar Papers

Reconciling the contrasting narratives on the environmental impact of large language models
Shaolei Ren ... Andrew W Torrance
Scientific Reports | VOL. 14
Shaolei Ren, et. al.Shaolei Ren ... Andrew W Torrance
01 Nov 2024
Scientific Reports | VOL. 14

Navigating GPT-4 and BERT: A Dual Perspective on Financial and Political Sentiment Analysis
Akash Ghosh ... Rahul Sarkar
International Journal for Research in Applied Science and Engineering Technology | VOL. 11
Akash Ghosh, et. al.Akash Ghosh ... Rahul Sarkar
31 Dec 2024
International Journal for Research in Applied Science and Engineering Technology | VOL. 11

Large Language Models for Time Series: A Survey
Xiyuan Zhang ... Jingbo Shang
-
Xiyuan Zhang, et. al.Xiyuan Zhang ... Jingbo Shang
01 Aug 2024
01 Aug 2024

LLMEffiChecker:Understanding and Testing Efficiency Degradation of Large Language Models
Xiaoning Feng ... Xiaohong Han
ACM Transactions on Software Engineering and Methodology | VOL. -
Xiaoning Feng, et. al.Xiaoning Feng ... Xiaohong Han
13 May 2024
ACM Transactions on Software Engineering and Methodology | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar: A Synergy of Large and Small.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on pattern analysis and machine intelligence