Arize Phoenix logo

Arize Phoenix

Freemium

Open-source LLM observability from Arize. Uses OpenInference — a widely-adopted set of OpenTelemetry semantic conventions for LLM spans — so instrumentation is portable across backends. Elastic License 2.0. Notebook-friendly, with a strong eval harness (Phoenix Evals) and embedding drift detection that's distinctive among open-source options.

Visit websiteGitHub

Pricing

Freemium

Type

Automation

Languages

Python, TypeScript

// VERDICT

Reach for Arize Phoenix when you want open-source, OpenTelemetry-based tracing and evaluation for LLM/ML apps, runnable locally. Skip it when you want a fully managed platform or just config-driven prompt tests.

Best for

Open-source observability and evaluation for LLM and ML apps - OpenTelemetry-based tracing, eval and drift analysis you can run locally or self-hosted.

Avoid when

You want a fully managed vendor platform, or a simple config-only prompt tester.

CI/CD fit

OpenTelemetry tracing · local/self-host · eval runs

Languages

Python · TypeScript

Team fit

LLM/ML app teams · Teams wanting OSS observability · Dev/QA debugging + evaluating

Setup

Easy

Maintenance

Low

Learning

Intermediate

Licence

Freemium

// BEST FOR

  • Open-source LLM/ML tracing via OpenTelemetry
  • Running locally or self-hosted (no vendor lock-in)
  • Evaluating outputs and analysing drift
  • Debugging chains/agents from spans
  • Inspecting RAG retrieval and embeddings
  • Feeding traced failures into eval sets

// AVOID WHEN

  • You want a fully managed vendor platform
  • A simple config-only prompt tester is enough
  • You can't run/host the tool
  • You're not building LLM/ML apps
  • Turnkey enterprise support is essential
  • Only manual eval is needed

// QUICK START

pip install arize-phoenix
# launch Phoenix locally, instrument the app via OpenTelemetry
# inspect traces, run evals, analyse drift

// ALTERNATIVES TO CONSIDER

ToolChoose it when
LangfuseYou want open-source tracing with prompt management.
LangSmithYou want a managed platform with datasets.
TruLensYou want feedback-function evaluation in code.

// FEATURES

  • OpenTelemetry-native with OpenInference semantic conventions — instrument once, send anywhere
  • Phoenix Evals — research-backed metrics covering agents, RAG, and safety
  • Embedding drift detection and RAG-specific quality metrics
  • Notebook-first workflow; runs in Colab or locally for rapid experimentation
  • Free open-source Phoenix; commercial Arize AX for enterprise scale

// PROS

  • OpenInference compatibility means no re-instrumentation if you ever migrate backends
  • Strongest eval-metric library among open-source options
  • Free Phoenix tier is unrestricted for self-hosting

// CONS

  • Trace UX is span-tree-first — no transcript view for long agent runs
  • Less purpose-built for production agent debugging than Laminar
  • Graduating to commercial Arize AX has a different cost curve — plan ahead if you need enterprise scale

// EXAMPLE QA WORKFLOW

  1. Install Phoenix (pip) and launch locally/self-hosted

  2. Instrument the app with OpenTelemetry

  3. Capture traces of LLM/RAG runs

  4. Evaluate outputs and analyse drift

  5. Debug failures from spans

  6. Feed traced failures into eval datasets