~ / directory / harmbench
HA
Mixed · AI Red Teaming · reviewed 2026-04

HarmBench

Automated red teaming and robust refusal evaluation framework.

Visit www.harmbench.org
01

What it does

An academic benchmark framework for evaluating adversarial robustness of language models. Provides standardised evaluation of both attack methods and defence mechanisms, with automated red teaming capabilities across multiple attack vectors.

02

Security relevance

HarmBench offers the most rigorous academic evaluation of LLM safety. It tests models against a curated set of harmful behaviours and measures both the success rate of attacks and the robustness of refusals. Useful for comparing model safety properties before procurement decisions.

03

When to use it

Use when you need academic-grade evaluation of model robustness, particularly for model selection decisions. Requires GPU infrastructure, model loading expertise, and familiarity with evaluation pipelines. Not a quick scan — this is deep evaluation work.

04

OWASP coverage

Risks addressed — mapped to both OWASP Top 10 standards. 5 in LLM, 1 in Agentic.

Agentic Top 10 · 2026 · 1/10 covered
01
02
03
04
05
06
07
08
09
10
05

The raw record

What Yuntona stores. Single source of truth — fork it on GitHub.

name: HarmBench
slug: harmbench
type: Mixed
category: AI Red Teaming
url: https://www.harmbench.org

reviewed:   2026-04
added:      2026-04
updated:    2026-04

risks:
  llm:  [LLM01, LLM02, LLM03, LLM06, LLM09]
  asi:  [ASI01]

complexity:    Expert Required
pricing:       —
audience:      Red Team
lifecycle:     [test]

tags: [Benchmark, Eval, Open Source]