deepchecks
Open-source LLM evaluation and testing — continuous validation, bias detection, and regression testing.
What it does
Open-source testing framework for validating AI/ML models and LLM applications. Provides pre-built test suites for data validation, model evaluation, and LLM output testing. Tests for hallucination, bias, toxicity, and data integrity issues. Python-native with CI/CD integration. Appears in CB Insights' AI agent tech stack Oversight layer.
Security relevance
Provides the testing layer that catches AI quality and safety issues before deployment. Pre-built checks cover common failure modes: data drift, label errors, feature importance shifts, and LLM-specific issues like hallucination and prompt sensitivity. Open-source means full transparency into what's being tested and how.
When to use it
Use during AI development to validate model quality and safety before deployment. Excellent for teams that want to add AI-specific testing to existing CI/CD pipelines without vendor lock-in. Python SDK integrates with Jupyter, pytest, and standard ML workflows. Open-source core with commercial cloud offering.
OWASP coverage
Risks addressed — mapped to both OWASP Top 10 standards. 3 in LLM, 2 in Agentic.
The raw record
What Yuntona stores. Single source of truth — fork it on GitHub.
name: deepchecks slug: deepchecks type: Mixed category: AI Red Teaming url: https://deepchecks.com reviewed: 2026-04 added: 2026-04 updated: 2026-04 risks: llm: [LLM01, LLM04, LLM09] asi: [ASI01, ASI06] complexity: Guided Setup pricing: — audience: Builder lifecycle: [develop] tags: [Evaluation, Open Source, Python, Testing]