Research on adversarial attack and defense of large language models

Jidong Yang,Qiangyun Chi,Wenqiang Xu,Huaike Yu

doi:10.54254/2755-2721/93/20240922

Abstract

Abstract. Large language models (LLMs) have made excellent progress in text and image understanding and generation. However, with the wide range of applications of these models in various industries, the issue of their security, especially the defense against adversarial attacks, has become a focus of research. This study focuses on exploring the adversarial attacks faced by LLMs and their defense strategies, especially the design and optimization of defense mechanisms. Through literature review and case studies, this paper analyzes in detail the white-box and black-box attack patterns against LLMs, including model inversion, backdoor attacks, and token-based strategies. In response to these attacks, this paper proposes a series of defense strategies, including preventive measures such as data augmentation, adversarial training and model regularization, as well as real-time attack detection and response strategies such as anomaly detection and adversarial sample detection techniques. The core of this research is to improve the robustness and trustworthiness of LLMs, providing the necessary guarantees for their integration and sustainability in multiple industrial applications. In addition, this paper proposes future research directions, highlighting the importance of developing advanced defense systems, promoting interdisciplinary research and exploring new applications for LLMs. This research provides valuable insights into understanding and improving the security defense mechanisms of LLMs, which is essential for maintaining the security and user trust of these models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Research on adversarial attack and defense of large language models

Abstract

Talk to us

Similar Papers

More From: Applied and Computational Engineering

Lead the way for us

Similar Papers

Enhancing ID-based Recommendation with Large Language Models
Lei Chen ... Meng Wang
ACM Transactions on Information Systems | VOL. -
Lei Chen, et. al.Lei Chen ... Meng Wang
13 Nov 2024
ACM Transactions on Information Systems | VOL. -

Evaluating large language models for health-related text classification tasks with public social media data.
Yuting Guo ... Abeed Sarker
Journal of the American Medical Informatics Association : JAMIA | VOL. -
Yuting Guo, et. al.Yuting Guo ... Abeed Sarker
09 Aug 2024
Journal of the American Medical Informatics Association : JAMIA | VOL. -

Red Teaming Language Model Detectors with Language Models
Zhouxing Shi ... Yihan Wang
Transactions of the Association for Computational Linguistics | VOL. 12
Zhouxing Shi, et. al.Zhouxing Shi ... Yihan Wang
23 Feb 2024
Transactions of the Association for Computational Linguistics | VOL. 12

Inferring cancer disease response from radiology reports using large language models with data augmentation and prompting.
... Fung Fung Lee
Journal of the American Medical Informatics Association | VOL. 30
, et. al. ... Fung Fung Lee
14 Jul 2023
Journal of the American Medical Informatics Association | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Research on adversarial attack and defense of large language models

Abstract

Talk to us

Similar Papers

More From: Applied and Computational Engineering