PromptFoo

Freemium

CLI and library for testing, comparing, and red-teaming LLM prompts and providers.

Visit website GitHub

Pricing

Freemium

Type

Automation

Languages

JavaScript, TypeScript, Python

// VERDICT

Reach for PromptFoo when you want quick, declarative prompt testing, model/prompt comparison and built-in red-teaming from the CLI and CI. Skip it when you prefer code-first evals (DeepEval) or a hosted eval platform.

Best for

Config-driven testing and evaluation of prompts and LLM apps - declare test cases and assertions in YAML, compare prompts/models side by side, and run red-teaming for vulnerabilities, all CLI-first.

Avoid when

You want a fully hosted platform, code-first eval as unit tests, or you're not testing prompts/LLM outputs.

CI/CD fit

promptfoo CLI · YAML config · GitHub Actions · red-team scans

Languages

JavaScript · TypeScript · Python

Team fit

Prompt engineers · Dev/QA testing LLM features · Teams comparing prompts/models

Setup

Easy

Maintenance

Low

Learning

Beginner

Licence

Freemium

// BEST FOR

Declaring prompt test cases and assertions in YAML
Comparing prompts and models side by side
Built-in red-teaming for LLM vulnerabilities
Running prompt evals from the CLI and CI
Catching prompt regressions on edits
A low-setup, config-first approach

// AVOID WHEN

You want a fully hosted eval+observability platform
You prefer code-first evals as unit tests
You're not testing prompts/LLM outputs
Deep custom metric logic beyond config is needed
A managed team UI is essential
Manual human eval is your only method

// QUICK START

npx promptfoo@latest init
# promptfooconfig.yaml: prompts, providers, tests (asserts)
npx promptfoo eval     # compare prompts/models; promptfoo redteam for scans

// ALTERNATIVES TO CONSIDER

Tool	Choose it when
DeepEval	You want code-first evals with a pytest-like API.
OpenAI Evals	You want OpenAI's eval framework and registry.
Braintrust	You want a hosted eval platform with datasets and UI.

// FEATURES

Declarative YAML test cases for prompts and providers
Side-by-side comparison of model outputs
Built-in assertions including contains, JSON, and LLM-graded checks
Automated red-teaming probes for jailbreaks and PII leaks
Web UI for browsing eval results and history

// PROS

Low-friction CLI — no Python or notebook setup required
Works with any model behind a CLI or HTTP endpoint
Red-team module covers a useful range of attack patterns
Easy to wire into CI pipelines

// CONS

YAML configs become unwieldy at scale
Cloud and enterprise features sit behind a paywall
Less Python-native than other LLM eval libraries

// EXAMPLE QA WORKFLOW

Install promptfoo (npx)
Write a YAML config of prompts, providers and tests
Add assertions (contains, similarity, LLM-graded)
Compare prompts/models side by side
Run in CI and gate on regressions
Schedule red-team scans for vulnerabilities

// RELATED QA.CODES RESOURCES

Cheat sheets

Testing AI Systems

Glossary

Interview

Testing AI systems interview questions

// VERDICT

// BEST FOR

// AVOID WHEN

// QUICK START

// ALTERNATIVES TO CONSIDER

// FEATURES

// PROS

// CONS

// EXAMPLE QA WORKFLOW

// RELATED QA.CODES RESOURCES

// RELATED TOOLS