PromptFoo logo

PromptFoo

Freemium

CLI and library for testing, comparing, and red-teaming LLM prompts and providers.

Visit websiteGitHub

Pricing

Freemium

Type

Automation

Languages

JavaScript, TypeScript, Python

// VERDICT

Reach for PromptFoo when you want quick, declarative prompt testing, model/prompt comparison and built-in red-teaming from the CLI and CI. Skip it when you prefer code-first evals (DeepEval) or a hosted eval platform.

Best for

Config-driven testing and evaluation of prompts and LLM apps - declare test cases and assertions in YAML, compare prompts/models side by side, and run red-teaming for vulnerabilities, all CLI-first.

Avoid when

You want a fully hosted platform, code-first eval as unit tests, or you're not testing prompts/LLM outputs.

CI/CD fit

promptfoo CLI · YAML config · GitHub Actions · red-team scans

Languages

JavaScript · TypeScript · Python

Team fit

Prompt engineers · Dev/QA testing LLM features · Teams comparing prompts/models

Setup

Easy

Maintenance

Low

Learning

Beginner

Licence

Freemium

// BEST FOR

  • Declaring prompt test cases and assertions in YAML
  • Comparing prompts and models side by side
  • Built-in red-teaming for LLM vulnerabilities
  • Running prompt evals from the CLI and CI
  • Catching prompt regressions on edits
  • A low-setup, config-first approach

// AVOID WHEN

  • You want a fully hosted eval+observability platform
  • You prefer code-first evals as unit tests
  • You're not testing prompts/LLM outputs
  • Deep custom metric logic beyond config is needed
  • A managed team UI is essential
  • Manual human eval is your only method

// QUICK START

npx promptfoo@latest init
# promptfooconfig.yaml: prompts, providers, tests (asserts)
npx promptfoo eval     # compare prompts/models; promptfoo redteam for scans

// ALTERNATIVES TO CONSIDER

ToolChoose it when
DeepEvalYou want code-first evals with a pytest-like API.
OpenAI EvalsYou want OpenAI's eval framework and registry.
BraintrustYou want a hosted eval platform with datasets and UI.

// FEATURES

  • Declarative YAML test cases for prompts and providers
  • Side-by-side comparison of model outputs
  • Built-in assertions including contains, JSON, and LLM-graded checks
  • Automated red-teaming probes for jailbreaks and PII leaks
  • Web UI for browsing eval results and history

// PROS

  • Low-friction CLI — no Python or notebook setup required
  • Works with any model behind a CLI or HTTP endpoint
  • Red-team module covers a useful range of attack patterns
  • Easy to wire into CI pipelines

// CONS

  • YAML configs become unwieldy at scale
  • Cloud and enterprise features sit behind a paywall
  • Less Python-native than other LLM eval libraries

// EXAMPLE QA WORKFLOW

  1. Install promptfoo (npx)

  2. Write a YAML config of prompts, providers and tests

  3. Add assertions (contains, similarity, LLM-graded)

  4. Compare prompts/models side by side

  5. Run in CI and gate on regressions

  6. Schedule red-team scans for vulnerabilities

// RELATED QA.CODES RESOURCES