The AI Infrastructure Stack
Sub-category 3.3

Evals, Observability & Tracing

Measuring whether an AI system is actually working in prod. The Datadog of LLMs.

Players

Players: Braintrust Private, LangSmith Private, Arize Private, Weights & Biases CRWV (acquired by CoreWeave), Galileo Private, Humanloop Private, Helicone Private, Datadog LLM Observability DDOG, New Relic AI Monitoring Private

Analysis coming soon — this page is scaffolding for deeper research into evals, observability & tracing.