Monstera - Nuno Fangueiro Monstera - Nuno Fangueiro

LLM Security

LLM security is the investigation of the failure modes of LLMs in use, the conditions that lead to them, and their mitigations.

Here are links to large language model security content - research, papers, and news - posted by @llm_sec

Got a tip/link? Open a pull request or send a DM.

Getting Started

Attacks

Adversarial

Backdoors & data poisoning

Prompt injection

Jailbreaking

Data extraction & privacy

Data reconstruction

Denial of service

Escalation

Demystifying RCE Vulnerabilities in LLM-Integrated Apps 🌶️

Evasion

Malicious code

XSS/CSRF/CPRF

Cross-model

Multimodal

Model theft

Attack automation

Defenses & Detections

against things other than backdoors

against backdoors / backdoor insertion

Evaluation

Practices

Analyses & surveys

Software

LLM-specific

  • BITE Textual Backdoor Attacks with Iterative Trigger Injection
  • garak LLM vulnerability scanner 🌶️🌶️
  • HouYi successful prompt injection framework 🌶️
  • dropbox/llm-security demo scripts & docs for LLM attacks
  • promptmap bulk testing of prompt injection on openai LLMs
  • rebuff LLM Prompt Injection Detector
  • risky llm input detection

general MLsec

🌶️ = extra spicy