Purple Llama (Meta)

Open trust and safety tools for evaluating generative AI.

Visit ai.meta.com/blog/purple-llama-open-trust-safety-generative-ai ↗

What it does

Meta's open-source trust and safety toolkit for evaluating generative AI systems. Includes CyberSecEval benchmarks for measuring LLM security, Llama Guard for content classification, and Code Shield for detecting insecure code generation.

Security relevance

CyberSecEval is one of the few standardised benchmarks for measuring LLM security posture. It tests for prompt injection susceptibility, insecure code generation, and cybersecurity knowledge. Llama Guard provides a practical content safety classifier that can be deployed as a guardrail layer.

When to use it

Use during model evaluation to benchmark security properties before deployment. CyberSecEval gives you comparable metrics across different models. Llama Guard is useful as a building block for content safety pipelines.

OWASP coverage

Risks addressed — mapped to both OWASP Top 10 standards. 4 in LLM, 2 in Agentic.

LLM Top 10 · 2025 · 4/10 covered

LLM01 · Prompt Injection LLM02 · Sensitive Information Disclosure LLM07 · System Prompt Leakage LLM09 · Misinformation

Agentic Top 10 · 2026 · 2/10 covered

ASI01 · Agent Goal Hijack ASI04 · Agentic Supply Chain Vulnerabilities

The raw record

What Yuntona stores. Single source of truth — fork it on GitHub.

name: Purple Llama (Meta)
slug: purple-llama-meta
type: Mixed
category: AI Red Teaming
url: https://ai.meta.com/blog/purple-llama-open-trust-safety-generative-ai

reviewed:   2026-04
added:      2026-04
updated:    2026-04

risks:
  llm:  [LLM01, LLM02, LLM07, LLM09]
  asi:  [ASI01, ASI04]

complexity:    Guided Setup
pricing:       —
audience:      Red Team
lifecycle:     [develop]

tags: [Eval, Meta, Open Source, Safety]