deepchecks

Open-source LLM evaluation and testing — continuous validation, bias detection, and regression testing.

What it does

Open-source testing framework for validating AI/ML models and LLM applications. Provides pre-built test suites for data validation, model evaluation, and LLM output testing. Tests for hallucination, bias, toxicity, and data integrity issues. Python-native with CI/CD integration. Appears in CB Insights' AI agent tech stack Oversight layer.

Security relevance

Provides the testing layer that catches AI quality and safety issues before deployment. Pre-built checks cover common failure modes: data drift, label errors, feature importance shifts, and LLM-specific issues like hallucination and prompt sensitivity. Open-source means full transparency into what's being tested and how.

When to use it

Use during AI development to validate model quality and safety before deployment. Excellent for teams that want to add AI-specific testing to existing CI/CD pipelines without vendor lock-in. Python SDK integrates with Jupyter, pytest, and standard ML workflows. Open-source core with commercial cloud offering.

OWASP coverage

Risks addressed — mapped to both OWASP Top 10 standards. 3 in LLM, 2 in Agentic.

LLM Top 10 · 2025 · 3/10 covered

LLM01 · Prompt Injection LLM04 · Data & Model Poisoning LLM09 · Misinformation

Agentic Top 10 · 2026 · 2/10 covered

ASI01 · Agent Goal Hijack ASI06 · Memory & Context Poisoning

The raw record

What Yuntona stores. Single source of truth — fork it on GitHub.

name: deepchecks
slug: deepchecks
type: Mixed
category: AI Red Teaming
url: https://deepchecks.com

reviewed:   2026-04
added:      2026-04
updated:    2026-04

risks:
  llm:  [LLM01, LLM04, LLM09]
  asi:  [ASI01, ASI06]

complexity:    Guided Setup
pricing:       —
audience:      Builder
lifecycle:     [develop]

tags: [Evaluation, Open Source, Python, Testing]