What it does
An academic benchmark framework for evaluating adversarial robustness of language models. Provides standardised evaluation of both attack methods and defence mechanisms, with automated red teaming capabilities across multiple attack vectors.
Security relevance
HarmBench offers the most rigorous academic evaluation of LLM safety. It tests models against a curated set of harmful behaviours and measures both the success rate of attacks and the robustness of refusals. Useful for comparing model safety properties before procurement decisions.
When to use it
Use when you need academic-grade evaluation of model robustness, particularly for model selection decisions. Requires GPU infrastructure, model loading expertise, and familiarity with evaluation pipelines. Not a quick scan — this is deep evaluation work.
OWASP coverage
Risks addressed — mapped to both OWASP Top 10 standards. 5 in LLM, 1 in Agentic.
The raw record
What Yuntona stores. Single source of truth — fork it on GitHub.
name: HarmBench slug: harmbench type: Mixed category: AI Red Teaming url: https://www.harmbench.org reviewed: 2026-04 added: 2026-04 updated: 2026-04 risks: llm: [LLM01, LLM02, LLM03, LLM06, LLM09] asi: [ASI01] complexity: Expert Required pricing: — audience: Red Team lifecycle: [test] tags: [Benchmark, Eval, Open Source]