LLM Security

LLM security is the investigation of the failure modes of LLMs in use, the conditions that lead to them, and their mitigations.

Here are links to large language model security content - research, papers, and news - posted by @llm_sec

Got a tip/link? Open a pull request or send a DM.

Getting Started

Attacks

Adversarial

Backdoors & data poisoning

Prompt injection

Jailbreaking

Data extraction & privacy

Evasion

Malicious code

XSS/CSRF/CPRF

LLM causing self-XSS

Cross-model

Exploring the Vulnerability of Natural Language Processing Models via Universal Adversarial Texts

Multimodal

Model theft

Stealing Machine Learning Models via Prediction APIs

Attack automation

Defenses & Detections

against things other than backdoors

against backdoors / backdoor insertion

Evaluation

Practices

Analyses & surveys

Software

LLM-specific

BITE Textual Backdoor Attacks with Iterative Trigger Injection
garak LLM vulnerability scanner 🌶️🌶️
HouYi successful prompt injection framework 🌶️
dropbox/llm-security demo scripts & docs for LLM attacks
promptmap bulk testing of prompt injection on openai LLMs
rebuff LLM Prompt Injection Detector
risky llm input detection

general MLsec

Adversarial Robustness Toolkit
nvtrust Ancillary open source software to support confidential computing on NVIDIA GPUs

🌶️ = extra spicy