Arize Phoenix

Freemium

Open-source LLM observability from Arize. Uses OpenInference — a widely-adopted set of OpenTelemetry semantic conventions for LLM spans — so instrumentation is portable across backends. Elastic License 2.0. Notebook-friendly, with a strong eval harness (Phoenix Evals) and embedding drift detection that's distinctive among open-source options.

Visit website GitHub

Pricing

Freemium

Type

Automation

Languages

Python, TypeScript

// VERDICT

Reach for Arize Phoenix when you want open-source, OpenTelemetry-based tracing and evaluation for LLM/ML apps, runnable locally. Skip it when you want a fully managed platform or just config-driven prompt tests.

Best for

Open-source observability and evaluation for LLM and ML apps - OpenTelemetry-based tracing, eval and drift analysis you can run locally or self-hosted.

Avoid when

You want a fully managed vendor platform, or a simple config-only prompt tester.

CI/CD fit

OpenTelemetry tracing · local/self-host · eval runs

Languages

Python · TypeScript

Team fit

LLM/ML app teams · Teams wanting OSS observability · Dev/QA debugging + evaluating

Setup

Easy

Maintenance

Low

Learning

Intermediate

Licence

Freemium

// BEST FOR

Open-source LLM/ML tracing via OpenTelemetry
Running locally or self-hosted (no vendor lock-in)
Evaluating outputs and analysing drift
Debugging chains/agents from spans
Inspecting RAG retrieval and embeddings
Feeding traced failures into eval sets

// AVOID WHEN

You want a fully managed vendor platform
A simple config-only prompt tester is enough
You can't run/host the tool
You're not building LLM/ML apps
Turnkey enterprise support is essential
Only manual eval is needed

// QUICK START

pip install arize-phoenix
# launch Phoenix locally, instrument the app via OpenTelemetry
# inspect traces, run evals, analyse drift

// ALTERNATIVES TO CONSIDER

Tool	Choose it when
Langfuse	You want open-source tracing with prompt management.
LangSmith	You want a managed platform with datasets.
TruLens	You want feedback-function evaluation in code.

// FEATURES

OpenTelemetry-native with OpenInference semantic conventions — instrument once, send anywhere
Phoenix Evals — research-backed metrics covering agents, RAG, and safety
Embedding drift detection and RAG-specific quality metrics
Notebook-first workflow; runs in Colab or locally for rapid experimentation
Free open-source Phoenix; commercial Arize AX for enterprise scale

// PROS

OpenInference compatibility means no re-instrumentation if you ever migrate backends
Strongest eval-metric library among open-source options
Free Phoenix tier is unrestricted for self-hosting

// CONS

Trace UX is span-tree-first — no transcript view for long agent runs
Less purpose-built for production agent debugging than Laminar
Graduating to commercial Arize AX has a different cost curve — plan ahead if you need enterprise scale