PromptFoo
CLI and library for testing, comparing, and red-teaming LLM prompts and providers.
Pricing
Freemium
Type
Automation
Languages
JavaScript, TypeScript, Python
// VERDICT
Reach for PromptFoo when you want quick, declarative prompt testing, model/prompt comparison and built-in red-teaming from the CLI and CI. Skip it when you prefer code-first evals (DeepEval) or a hosted eval platform.
Best for
Config-driven testing and evaluation of prompts and LLM apps - declare test cases and assertions in YAML, compare prompts/models side by side, and run red-teaming for vulnerabilities, all CLI-first.
Avoid when
You want a fully hosted platform, code-first eval as unit tests, or you're not testing prompts/LLM outputs.
CI/CD fit
promptfoo CLI · YAML config · GitHub Actions · red-team scans
Languages
JavaScript · TypeScript · Python
Team fit
Prompt engineers · Dev/QA testing LLM features · Teams comparing prompts/models
Setup
Maintenance
Learning
Licence
// BEST FOR
- Declaring prompt test cases and assertions in YAML
- Comparing prompts and models side by side
- Built-in red-teaming for LLM vulnerabilities
- Running prompt evals from the CLI and CI
- Catching prompt regressions on edits
- A low-setup, config-first approach
// AVOID WHEN
- You want a fully hosted eval+observability platform
- You prefer code-first evals as unit tests
- You're not testing prompts/LLM outputs
- Deep custom metric logic beyond config is needed
- A managed team UI is essential
- Manual human eval is your only method
// QUICK START
npx promptfoo@latest init
# promptfooconfig.yaml: prompts, providers, tests (asserts)
npx promptfoo eval # compare prompts/models; promptfoo redteam for scans// ALTERNATIVES TO CONSIDER
| Tool | Choose it when |
|---|---|
| DeepEval | You want code-first evals with a pytest-like API. |
| OpenAI Evals | You want OpenAI's eval framework and registry. |
| Braintrust | You want a hosted eval platform with datasets and UI. |
// FEATURES
- Declarative YAML test cases for prompts and providers
- Side-by-side comparison of model outputs
- Built-in assertions including contains, JSON, and LLM-graded checks
- Automated red-teaming probes for jailbreaks and PII leaks
- Web UI for browsing eval results and history
// PROS
- Low-friction CLI — no Python or notebook setup required
- Works with any model behind a CLI or HTTP endpoint
- Red-team module covers a useful range of attack patterns
- Easy to wire into CI pipelines
// CONS
- YAML configs become unwieldy at scale
- Cloud and enterprise features sit behind a paywall
- Less Python-native than other LLM eval libraries
// EXAMPLE QA WORKFLOW
Install promptfoo (npx)
Write a YAML config of prompts, providers and tests
Add assertions (contains, similarity, LLM-graded)
Compare prompts/models side by side
Run in CI and gate on regressions
Schedule red-team scans for vulnerabilities
// RELATED QA.CODES RESOURCES
Cheat sheets
Glossary