OWASP Vendor Evaluation for AI Red Teaming
Evaluation criteria for assessing AI red teaming vendors and tools across simple and advanced GenAI systems. Green/red flags, discovery questions, and scoring checklist. Published Jan 2026.
What it does
OWASP GenAI Security Project publication (v1.0, Jan 2026) providing structured criteria for evaluating AI red teaming consultants and automated tools. Covers 13 evaluation categories: Technical Competence, Methodology & Coverage, Adversarial Creativity, Threat Modeling, Evaluation Rigor & Metrics, Tooling Quality, Data Governance, Transparency, Customization, Operational Fit, Limitations, Cost vs Value, and Legal/Compliance. Includes comparison matrix (consultants vs automated tools) and vendor evaluation checklist.
Security relevance
Distinguishes genuine adversarial evaluation from superficial testing. Identifies red flags like stock jailbreak libraries, no multi-turn capability, AI-generated evaluations without human oversight, and claims of full coverage. Specifies what effective testing looks like for MCP, tool-calling, and multi-agent systems.
When to use it
Use when selecting or evaluating AI red teaming vendors. Essential procurement reference for CISOs building AI security testing programmes.
OWASP coverage
Risks addressed — mapped to both OWASP Top 10 standards. 0 in LLM, 0 in Agentic.
The raw record
What Yuntona stores. Single source of truth — fork it on GitHub.
name: OWASP Vendor Evaluation for AI Red Teaming slug: owasp-vendor-evaluation-for-ai-red-teaming type: Agentic category: Education & Research url: https://genai.owasp.org reviewed: 2026-04 added: 2026-04 updated: 2026-04 risks: llm: [] asi: [] complexity: Plug & Play pricing: — audience: CISO · GRC lifecycle: [plan] tags: [Free, Guide, OWASP, Procurement, Red Teaming, Vendor Assessment]