HarmBench

Automated red teaming and robust refusal evaluation framework.

What it does

An academic benchmark framework for evaluating adversarial robustness of language models. Provides standardised evaluation of both attack methods and defence mechanisms, with automated red teaming capabilities across multiple attack vectors.

Security relevance

HarmBench offers the most rigorous academic evaluation of LLM safety. It tests models against a curated set of harmful behaviours and measures both the success rate of attacks and the robustness of refusals. Useful for comparing model safety properties before procurement decisions.

When to use it

Use when you need academic-grade evaluation of model robustness, particularly for model selection decisions. Requires GPU infrastructure, model loading expertise, and familiarity with evaluation pipelines. Not a quick scan — this is deep evaluation work.

OWASP coverage

Risks addressed — mapped to both OWASP Top 10 standards. 5 in LLM, 1 in Agentic.

LLM Top 10 · 2025 · 5/10 covered

LLM01 · Prompt Injection LLM02 · Sensitive Information Disclosure LLM03 · Supply Chain LLM06 · Excessive Agency LLM09 · Misinformation

Agentic Top 10 · 2026 · 1/10 covered

ASI01 · Agent Goal Hijack

The raw record

What Yuntona stores. Single source of truth — fork it on GitHub.

name: HarmBench
slug: harmbench
type: Mixed
category: AI Red Teaming
url: https://www.harmbench.org

reviewed:   2026-04
added:      2026-04
updated:    2026-04

risks:
  llm:  [LLM01, LLM02, LLM03, LLM06, LLM09]
  asi:  [ASI01]

complexity:    Expert Required
pricing:       —
audience:      Red Team
lifecycle:     [test]

tags: [Benchmark, Eval, Open Source]