SelfCP: Compressing over-limit prompt via the frozen large language model itself

Jun Gao,Ziqiang Cao,Wenjie Li

doi:10.1016/j.ipm.2024.103873

Abstract

Long prompt leads to huge hardware costs when using transformer-based Large Language Models (LLMs). Unfortunately, many tasks, such as summarization, inevitably introduce long documents, and the wide application of in-context learning easily makes the prompt length explode. This paper proposes a Self-Compressor (SelfCP), which adopts the target LLM itself to compress over-limit prompts into dense vectors on top of a sequence of learnable embeddings (memory tags) while keeping the allowed prompts unmodified. Dense vectors are then projected into memory tokens via a learnable connector, allowing the same LLM to understand them. The connector and the memory tag are supervised-tuned under the language modeling objective of the LLM on relatively long texts selected from publicly accessed datasets involving an instruction dataset to make SelfCP respond to various prompts, while the target LLM keeps frozen during training. We build the lightweight SelfCP upon 2 different backbones with merely 17M learnable parameters originating from the connector and a learnable embedding. Evaluation on both English and Chinese benchmarks demonstrate that SelfCP effectively substitutes 12× over-limit prompts with memory tokens to reduce memory costs and booster inference throughputs, yet improving response quality. The outstanding performance brings an efficient solution for LLMs to tackle long prompts without training LLMs from scratch.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SelfCP: Compressing over-limit prompt via the frozen large language model itself

Abstract

Talk to us

Similar Papers

More From: Information Processing and Management

Lead the way for us

Similar Papers

Commonsense Knowledge in Foundation and Large Language Models
Harsh Bhardwaj ... Maniya Tadhiyal
International Journal of Advanced Research in Science, Communication and Technology | VOL. -
Harsh Bhardwaj, et. al. Harsh Bhardwaj ... Maniya Tadhiyal
08 Feb 2024
International Journal of Advanced Research in Science, Communication and Technology | VOL. -

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

Jigsaw
Naman Jain ... Arun Iyer
-
Naman Jain, et. al.Naman Jain ... Arun Iyer
21 May 2022
21 May 2022

A Large and Diverse Arabic Corpus for Language Modeling
Abbas Raza Ali ... Hasan Raza Ali
Procedia Computer Science | VOL. 225
Abbas Raza Ali, et. al.Abbas Raza Ali ... Hasan Raza Ali
01 Jan 2023
Procedia Computer Science | VOL. 225

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SelfCP: Compressing over-limit prompt via the frozen large language model itself

Abstract

Talk to us

Similar Papers

More From: Information Processing and Management