See every dollar before the invoice arrives.
A real-time cost dashboard built on the gateway's own request log — no separate billing pipeline, no provider-CSV-of-the-month. Spend by provider, endpoint, and token, with a 30-day trend chart and an explicit 'saved by cache' block.
AI cost surprises are a monthly ritual.
Most teams find out about an LLM-spend blow-up two weeks after the fact, when the provider invoice lands and someone notices the line item doubled. By then a leaked token has been burning for ten days, an agent has been retrying a 4o-mini prompt 50× per second, or a noisy customer has been pinning the expensive model in a tight loop.
The cost dashboard pulls every request, every token count, every provider's pricing, and renders the result the moment it lands. You see "this token spent $400 today, this endpoint is up 3× since Monday, the cache is saving you $60/day on this prompt" — within minutes, not invoice cycles.
- Live aggregation — last-hour numbers are seconds-old
- Cache savings broken out so you can prove the cache is working
- Per-token breakdown — find the "one app that ate the budget"
- Driver-aware SQL — same dashboard works on SQLite and Postgres
— Spend (last 30 days) —————————————— $1,847.20 total -$182.40 saved by cache (9.9%) +$214.30 vs. last 30d (13.1% ↑) — By provider — anthropic $1,402.10 76% openai $312.60 17% together $98.40 5% groq $34.10 2% — Top endpoints (last 7 days) — /summarise-ticket $412.30 /agent-proxy-claude $287.60 /classify-intent $94.20 — Top tokens (last 7 days) — pg_live_xy7… customer-A $312.40 pg_live_qz3… internal-bg $187.20 pg_live_ab8… agent-team-2 $156.10
# Pricing data is built into the gateway, kept current per release. # Every request logs prompt_tokens + completion_tokens; the dashboard # multiplies by the right model rate at query time. provider: anthropic model: claude-3-5-sonnet prompt_tokens: 14_320 completion_tokens: 642 # → ($3 × 14.32) + ($15 × 0.642) per million # $0.04296 input + $0.00963 output = $0.05259 # Cache hits log with cost = $0 + a savings counter. # When the cache returns a response, "what the call would have cost" # is recorded so the dashboard can show savings honestly. cache_hit: true prompt_tokens_avoided: 14_320 completion_tokens_avoided: 642 # → savings: $0.05259
Built on the request log, not a separate pipeline.
Every request through the gateway already logs prompt_tokens, completion_tokens, the provider, the model, and the calling token. The cost dashboard is just SQL aggregation over that log, multiplied by the built-in pricing table the gateway keeps up to date with each release.
No separate billing system, no scheduled CSV imports, no third-party metering. The dashboard you see is the same data the audit log shows — just re-shaped.
For streaming responses, the gateway parses the final usage frame from each provider (OpenAI stream_options.include_usage, Anthropic message_delta.usage, Cohere message-end, Google usageMetadata) and records exact token counts — not estimates.
Five lenses on the same data.
The same gateway log feeds five views — pick whichever answers the question you actually have.
By provider
Total + percentage split across OpenAI, Anthropic, Google, Cohere, Mistral, Groq, Together, Ollama. Trend per provider over time.
By endpoint
Which endpoints are eating the budget. Often the most actionable view — usually one or two endpoints account for >50% of spend.
By token
Per-API-token spend — find the "one client that's hammering it". Combined with token tags (post-launch) for team-level views.
By model
Spend per provider model. Useful when routing rules send different traffic to different models — see if your rules are doing what you think.
Cache savings
Explicit "saved by cache" line — what the response cache prevented you from spending. Justifies the cache TTLs.
30-day trend
Daily total chart. Picks up burn-rate increases days before the invoice does.
Find the spend before it finds you.
Plus per-endpoint budgets that 422 the request before the provider call when the cap is hit, and anomaly alerts that webhook when burn-rate jumps.