RiskRubric
AI model risk report cards by Noma Security — A-F grades across six pillars, powered by Haize Labs red teaming.
What it does
A Noma Security project that scores 150+ LLMs on a 0-100 scale (A-F grades) across six weighted pillars: Security (25%), Reliability (20%), Privacy (20%), Transparency (15%), Safety & Societal Impact (15%), and Reputation (5%). Scores are generated using thousands of live, adaptive adversarial prompts per model via Haize Labs' automated red teaming engine — not pre-canned templates. Updated monthly and on every model version release. Raw results are publicly available on <a href='https://huggingface.co/datasets/nomasecurity/riskrubric-results' target='_blank'>Hugging Face</a>.
Security relevance
When a business unit wants to use a new LLM, security teams need a quick, evidence-based way to assess it. RiskRubric provides comparable scores that map directly to LLM risks — prompt injection resilience (LLM01), output validation (LLM02), supply chain transparency (LLM03, LLM05), data privacy (LLM06), and overreliance indicators (LLM09). The weighted scoring prioritises security and privacy over reputation, which aligns with enterprise risk priorities. Set a minimum grade threshold (e.g. C/70) for procurement decisions.
When to use it
Use during model selection and procurement decisions. Reference when business units request approval for new AI tools. The A-F grading system enables apples-to-apples comparison across competing models — essential for CISOs who need to justify model selection to boards. Check the Hugging Face dataset for raw scores when you need granular pillar-level data.
OWASP coverage
Risks addressed — mapped to both OWASP Top 10 standards. 6 in LLM, 2 in Agentic.
The raw record
What Yuntona stores. Single source of truth — fork it on GitHub.
name: RiskRubric slug: riskrubric type: Mixed category: Foundation Models url: https://riskrubric.ai reviewed: 2026-04 added: 2026-04 updated: 2026-04 risks: llm: [LLM01, LLM02, LLM03, LLM05, LLM06, LLM09] asi: [ASI01, ASI06] complexity: Plug & Play pricing: — audience: All lifecycle: [scope] tags: [Benchmark, Evaluation, Models, Noma Security, Risk]