Abstract
This paper presents an exploratory systematic analysis of prompt injection vulnerabilities across 36 diverse large language models (LLMs), revealing significant security concerns in these widely adopted AI tools. Prompt injection attacks, which involve crafting inputs to manipulate LLM outputs, pose risks such as unauthorized access, data leaks, and misinformation. Through 144 tests with four tailored prompt injections, we found that 56% of attempts successfully bypassed LLM safeguards, with vulnerability rates ranging from 53% to 61% across different prompt designs. Notably, 28% of tested LLMs were susceptible to all four prompts, indicating a critical lack of robustness. Our findings show that model size and architecture significantly influence susceptibility, with smaller models generally more prone to attacks. Statistical methods, including random forest feature analysis and logistic regression, revealed that model parameters play a primary role in vulnerability, though LLM type also contributes. Clustering analysis further identified distinct vulnerability profiles based on model configuration, underscoring the need for multi-faceted defence strategies. The study's implications are broad, particularly for sectors integrating LLMs into sensitive applications. Our results align with OWASP and MITRE’s security frameworks, highlighting the urgency for proactive measures, such as human oversight and trust boundaries, to protect against prompt injection risks. Future research should explore multilingual prompt injections and multi-step attack defences to enhance the resilience of LLMs in complex, real-world environments. This work contributes valuable insights into LLM vulnerabilities, aiming to advance the field toward safer AI deployments.
Published Version
Join us for a 30 min session where you can share your feedback and ask us any queries you have