The misuse of large language models (LLMs) is an ongoing concern with the expansion of general public access to LLMs. One reason for expanded access is the development of key-value (KV) cache quantization, a technique that significantly reduces both the large computing resource requirements and memory bottlenecks that are characteristic of LLMs. As more developers and vendors begin to prioritize efficiency in LLM training, protective measures against the misuse of language models become more of an afterthought. To address the expected increase of LLM misuse accompanying KV cache quantization, this paper covers a proof-of-concept benchmark to evaluate the proficiency of LLMs in response safety when tested against a sample of unsafe questions consisting of 13 different question categories. Response safety is a model’s ability to both clearly deny providing a response to a given question and avoid providing any additional information that attempts to provide an accurate answer. By testing the sample against the Meta Llama-2-7B pretrained chat model, we determine response safety fine-tuning considerations that address performance bias among the 13 question categories. We hope this study brings attention to not only the sacrifices of accuracy but also response safety of KV cache quantization in large language models. (Code and data are available at https://github.com/TimochiL/llm_benchmark.) Disclaimer: This paper contains examples of harmful language. Reader discretion is recommended.
Read full abstract