InMemQK: A Product Quantization Based MatMul Module for Compute-in-Memory Attention Macro

Pengcheng Feng,Yihao Chen,Jinke Yu,Hao Yue,Zhelong Jiang,Yi Xiao,Wan’Ang Xiao,Huaxiang Lu,Gang Chen

doi:10.3390/app142311198

Abstract

Large Language Models (LLMs), based on transformer architecture, have demonstrated remarkable capabilities in natural language processing tasks, enabling machines to generate human-like text and engage in meaningful dialogues. However, the exponential increase in model parameters has led to limitations in inference speed and energy efficiency. Compute-in-memory (CIM) technology offers a promising solution to accelerate AI inference by performing analog computations directly within memory, potentially reducing latency and power consumption. At the same time, CIM has been successfully applied to accelerate Convolutional Neural Networks (CNNs); however, the matrix–matrix multiplication (MatMul) operations inherent in the scaled dot-product attention of the transformer present unique challenges for direct CIM implementation. In this work, we propose InMemQK, a compute-in-memory-based attention accelerator that focuses on optimizing MatMul operations through software and hardware co-design. At the software level, InMemQK employs product quantization (PQ) to eliminate data dependencies. At the hardware level, InMemQK integrates energy-efficient time-domain MAC macros for ADC-free computations. Experimental results show InMemQK achieves 13.2×–13.9× lower power consumption than existing CIM-based accelerators.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

InMemQK: A Product Quantization Based MatMul Module for Compute-in-Memory Attention Macro

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Journal: Applied Sciences	Publication Date: Dec 1, 2024
License type: CC BY 4.0

Similar Papers

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang ... Haoming Jiang
ACM Transactions on Knowledge Discovery from Data | VOL. 18
Jingfeng Yang, et. al.Jingfeng Yang ... Haoming Jiang
26 Apr 2024
ACM Transactions on Knowledge Discovery from Data | VOL. 18

A Bibliometric Review of Large Language Models Research from 2017 to 2023
Lizhou Fan ... Lingyao Li
ACM Transactions on Intelligent Systems and Technology | VOL. 15
Lizhou Fan, et. al.Lizhou Fan ... Lingyao Li
21 Oct 2024
A Bibliometric Review of Large Language Models Research from 2017 to 2023
Lizhou Fan ... Lingyao Li

A Large and Diverse Arabic Corpus for Language Modeling
Abbas Raza Ali ... Hasan Raza Ali
Procedia Computer Science | VOL. 225
Abbas Raza Ali, et. al.Abbas Raza Ali ... Hasan Raza Ali
01 Jan 2023
Procedia Computer Science | VOL. 225

Integrating Retrieval-Augmented Generation with Large Language Model Mistral 7b for Indonesian Medical Herb
Diash Firdaus ... Idi Sumardi
JISKA (Jurnal Informatika Sunan Kalijaga) | VOL. 9
Diash Firdaus, et. al.Diash Firdaus ... Idi Sumardi
25 Sep 2024
JISKA (Jurnal Informatika Sunan Kalijaga) | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

InMemQK: A Product Quantization Based MatMul Module for Compute-in-Memory Attention Macro

Abstract

Talk to us

Similar Papers

More From: Applied Sciences