OWASP LLM Security Guide (2026)
Reference baseline: OWASP Top 10 for LLM Applications (2026) from OWASP GenAI Security Project.
Why LLM Security Is Different
LLM applications combine classic app risks with probabilistic model behavior. Real incidents often come from prompt manipulation, unsafe tool execution, or insecure retrieval paths rather than only network exploits.
OWASP LLM Top 10 (2026)
- LLM01 Prompt Injection
- LLM02 Sensitive Information Disclosure
- LLM03 Supply Chain
- LLM04 Data and Model Poisoning
- LLM05 Improper Output Handling
- LLM06 Excessive Agency
- LLM07 System Prompt Leakage
- LLM08 Vector and Embedding Weaknesses
- LLM09 Misinformation
- LLM10 Unbounded Consumption
Priority Risks First (Implementation Order)
LLM01Prompt InjectionLLM05Improper Output HandlingLLM06Excessive AgencyLLM02Sensitive Information DisclosureLLM10Unbounded Consumption
This order protects the highest-impact kill chains first: manipulate model -> force bad output -> execute privileged actions.
Risk-by-Risk Guidance
LLM01: Prompt Injection
What it is: attacker input alters model behavior and bypasses instruction intent.
Attack paths: - direct input manipulation in chat, including jailbreak roleplay and best-of-N retries, - indirect injection via RAG documents, links, files, and tool responses, - obfuscated or multimodal instructions.
Controls: - Separate trusted system policy from untrusted user and retrieved content. - Add policy classifiers for input and output. - Restrict model to explicit allowed tasks. - Require human-in-the-loop for high-risk actions. - Treat model output as untrusted data.
LLM02: Sensitive Information Disclosure
What it is: model reveals secrets, PII, tenant data, or internal architecture details.
Controls: - Minimize prompt context; include only required data. - Redact secrets and PII before prompt assembly. - Disable raw prompt/response logging by default. - Enforce retrieval authorization by user and tenant. - Add output DLP checks before returning responses.
LLM03: Supply Chain
What it is: compromised models, datasets, plugins, connectors, or dependencies.
Controls: - Maintain signed artifact provenance for models and prompts. - Pin model and dependency versions. - Review third-party tools/connectors before enablement. - Continuously scan dependencies and container images.
LLM04: Data and Model Poisoning
What it is: poisoned training, fine-tuning, or embedding data alters behavior.
Controls: - Gate data ingestion with quality and integrity checks. - Keep trusted data zones separate from user-contributed corpora. - Add anomaly detection for embedding outliers. - Use periodic benchmark regression tests for safety and quality drift.
LLM05: Improper Output Handling
What it is: downstream systems execute or render model output unsafely.
Examples: model-generated shell commands, SQL, HTML, templates, or API calls used without validation. Also includes Markdown-based data exfiltration payloads (image/link callbacks, hidden tracking URLs, and unsafe render targets).
Controls:
- Never execute raw output directly.
- Validate output against strict schemas.
- Context-aware escaping/sanitization (HTML, Markdown, SQL, shell).
- Render Markdown in safe mode; block remote image fetches, javascript:/data: URIs, and credentialed callback links.
- Deny dangerous commands/URLs/functions via allowlist-first policy.
LLM06: Excessive Agency
What it is: model has too much autonomy and can trigger irreversible actions.
Controls: - Least privilege for model identities and API tokens. - Read-only and write tools separated by default; enforce deny-by-default outbound egress with destination allowlists. - Multi-step confirmation for critical actions. - Per-action limits, dry-run mode, and rollback options. - Full audit trail for each tool call.
LLM07: System Prompt Leakage
What it is: attackers extract hidden instructions, policies, or internal metadata.
Controls: - Keep system prompts minimal and non-sensitive. - Store secrets in secure backends, never in prompt text. - Add leakage detection patterns and policy refusal behavior. - Rotate sensitive instruction templates when exposure is suspected.
LLM08: Vector and Embedding Weaknesses
What it is: RAG retrieval abuse, cross-tenant leakage, embedding attacks, and poisoned chunks.
Controls: - Enforce document-level ACLs in retrieval layer. - Segment vector indexes by tenant or sensitivity class. - Validate ingestion metadata and source trust. - Add retrieval relevance and grounding checks before answer generation.
LLM09: Misinformation
What it is: confident but false or biased outputs trigger bad business decisions.
Controls: - Require citations for high-impact responses. - Use confidence thresholds and abstain behavior. - Route critical domains (legal, medical, finance) to human review. - Track factuality metrics in regression tests.
LLM10: Unbounded Consumption
What it is: abuse of token usage, request loops, or expensive tool paths causing cost and availability impact.
Controls: - Token and request budgets per user/session/tenant. - Max iteration limits for agent loops. - Strict timeout and concurrency limits. - Budget alerts and auto-throttle policies.
Secure Architecture Blueprint
- Policy Enforcement Layer: centralized checks for input/output/action authorization.
- Tool Gateway: one controlled entrypoint with allowlists and argument validators.
- RAG Security Layer: ACL-aware retrieval, ingestion validation, poisoning defenses.
- Observability Layer: traces for prompts, retrieved chunks, tool calls, denials.
- Incident Controls: kill-switch for tools/models, emergency policies, key rotation.
CI/CD Security Gates
- Run automated LLM abuse tests on every merge request.
- Block release on unresolved High/Critical findings for injection, leakage, and unsafe actions.
- Require threat-model update when adding tool, connector, memory, or retrieval source.
- Require rollback runbook and explicit owner for high-impact agent capabilities.
Operational Metrics to Track
- Prompt injection detection rate.
- Unsafe output block rate.
- Unauthorized tool call attempts.
- Data leakage incidents per release.
- Token/cost spikes and throttling events.
Team Operating Rules
- Fail safely: if policy checks fail, deny action.
- Keep deterministic code controls authoritative over model intent.
- No direct privileged production action without explicit approval path.
- Every new LLM feature ships with negative abuse tests.
Additional Controls from Engineering Plan
- Memory security: isolate memory by user/tenant/session, enforce TTL, encrypt stored memory, and filter sensitive fields before persistence.
- Authentication and authorization: require strong user identity, role-aware retrieval/tool scopes, and deny cross-role action escalation.
- Sandboxing: execute generated code or risky tools only in isolated runtimes with network and filesystem restrictions.
- Compliance and privacy: map controls to GDPR/PII policies, enforce retention limits, and maintain data-subject deletion paths.
- Infrastructure security: protect APIs with mTLS/TLS, strong secret management, and network segmentation for model, vector DB, and tool backends.
- Failure modes: explicitly track hallucination, unsafe output, and tool misuse as incident classes with runbooks.
- Anti-patterns to avoid: blind trust in model output, no monitoring, no rate limits, and unrestricted agent permissions.
- Real-world scenarios: include prompt injection, poisoned RAG documents, and unauthorized tool execution in test catalogs.
- Defense in depth: combine input filtering, context isolation, output validation, and runtime monitoring.
- Staff-level heuristics: assume adversarial input, validate all boundaries, and default to least privilege everywhere.
Sources
- OWASP LLM Top 10 project: https://owasp.org/www-project-top-10-for-large-language-model-applications
- OWASP GenAI LLM Top 10 (2026): https://genai.owasp.org/llm-top-10/
- OWASP LLM01 Prompt Injection: https://genai.owasp.org/llmrisk/llm01-prompt-injection/
- NIST AI RMF GenAI Profile: https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence