~ / directory / purple-llama-meta
PL
Mixed · AI Red Teaming · reviewed 2026-04

Purple Llama (Meta)

Open trust and safety tools for evaluating generative AI.

01

What it does

Meta's open-source trust and safety toolkit for evaluating generative AI systems. Includes CyberSecEval benchmarks for measuring LLM security, Llama Guard for content classification, and Code Shield for detecting insecure code generation.

02

Security relevance

CyberSecEval is one of the few standardised benchmarks for measuring LLM security posture. It tests for prompt injection susceptibility, insecure code generation, and cybersecurity knowledge. Llama Guard provides a practical content safety classifier that can be deployed as a guardrail layer.

03

When to use it

Use during model evaluation to benchmark security properties before deployment. CyberSecEval gives you comparable metrics across different models. Llama Guard is useful as a building block for content safety pipelines.

04

OWASP coverage

Risks addressed — mapped to both OWASP Top 10 standards. 4 in LLM, 2 in Agentic.

Agentic Top 10 · 2026 · 2/10 covered
01
02
03
04
05
06
07
08
09
10
05

The raw record

What Yuntona stores. Single source of truth — fork it on GitHub.

name: Purple Llama (Meta)
slug: purple-llama-meta
type: Mixed
category: AI Red Teaming
url: https://ai.meta.com/blog/purple-llama-open-trust-safety-generative-ai

reviewed:   2026-04
added:      2026-04
updated:    2026-04

risks:
  llm:  [LLM01, LLM02, LLM07, LLM09]
  asi:  [ASI01, ASI04]

complexity:    Guided Setup
pricing:       —
audience:      Red Team
lifecycle:     [develop]

tags: [Eval, Meta, Open Source, Safety]