~ / directory / deepchecks
DE
Mixed · AI Red Teaming · reviewed 2026-04

deepchecks

Open-source LLM evaluation and testing — continuous validation, bias detection, and regression testing.

Visit deepchecks.com
01

What it does

Open-source testing framework for validating AI/ML models and LLM applications. Provides pre-built test suites for data validation, model evaluation, and LLM output testing. Tests for hallucination, bias, toxicity, and data integrity issues. Python-native with CI/CD integration. Appears in CB Insights' AI agent tech stack Oversight layer.

02

Security relevance

Provides the testing layer that catches AI quality and safety issues before deployment. Pre-built checks cover common failure modes: data drift, label errors, feature importance shifts, and LLM-specific issues like hallucination and prompt sensitivity. Open-source means full transparency into what's being tested and how.

03

When to use it

Use during AI development to validate model quality and safety before deployment. Excellent for teams that want to add AI-specific testing to existing CI/CD pipelines without vendor lock-in. Python SDK integrates with Jupyter, pytest, and standard ML workflows. Open-source core with commercial cloud offering.

04

OWASP coverage

Risks addressed — mapped to both OWASP Top 10 standards. 3 in LLM, 2 in Agentic.

Agentic Top 10 · 2026 · 2/10 covered
01
02
03
04
05
06
07
08
09
10
05

The raw record

What Yuntona stores. Single source of truth — fork it on GitHub.

name: deepchecks
slug: deepchecks
type: Mixed
category: AI Red Teaming
url: https://deepchecks.com

reviewed:   2026-04
added:      2026-04
updated:    2026-04

risks:
  llm:  [LLM01, LLM04, LLM09]
  asi:  [ASI01, ASI06]

complexity:    Guided Setup
pricing:       —
audience:      Builder
lifecycle:     [develop]

tags: [Evaluation, Open Source, Python, Testing]