Back to Inference Glossary
Inference Glossary

LLM Observability

A replayable audit log of every model call — input, output, model, provider, latency, cost, and quality signals — plus real-time dashboards over the same data.

LLM observability is the practice of capturing every model call in a structured, queryable, replayable log and surfacing aggregate views over the captured data. It is the foundation that every other operational discipline — debugging, evaluation, fine-tune candidate selection, cost attribution, incident response, compliance review — depends on.

The minimum useful schema for an observability record is: input (prompt, messages, tool definitions), output (response, tool calls, finish reason), model name, provider, latency components (TTFT, ITL, total), token counts (input, output, cached), cost in dollars, and trace identifiers tying the call back to the upstream request. Quality signals — judge scores, heuristic results, user thumbs-up-or-down — go in the same record when available.

The "replayable" part matters. The first time an audit reviewer asks "what did the model say to user X on Tuesday at 3:14 PM," a system without observability cannot answer. The first time a regression appears and the team needs to bisect, a system without observability cannot bisect. The first time a fine-tune candidate needs hard examples, a system without observability cannot mine them.

The platform that owns the gateway is the only place observability can be cleanly captured across providers. A per-provider observability tool sees only its provider's slice. A platform-level one sees everything, attributes cost in the same units, and can correlate quality signals across models. Cumulus' Observability subsystem is the data substrate the rest of the platform reads.