cost Feature

See every dollar before the invoice arrives.

A real-time cost dashboard built on the gateway's own request log — no separate billing pipeline, no provider-CSV-of-the-month. Spend by provider, endpoint, and token, with a 30-day trend chart and an explicit 'saved by cache' block.

Coming soonComing soon Cost docs ↗
WHY THIS MATTERS

AI cost surprises are a monthly ritual.

Most teams find out about an LLM-spend blow-up two weeks after the fact, when the provider invoice lands and someone notices the line item doubled. By then a leaked token has been burning for ten days, an agent has been retrying a 4o-mini prompt 50× per second, or a noisy customer has been pinning the expensive model in a tight loop.

The cost dashboard pulls every request, every token count, every provider's pricing, and renders the result the moment it lands. You see "this token spent $400 today, this endpoint is up 3× since Monday, the cache is saving you $60/day on this prompt" — within minutes, not invoice cycles.

  • Live aggregation — last-hour numbers are seconds-old
  • Cache savings broken out so you can prove the cache is working
  • Per-token breakdown — find the "one app that ate the budget"
  • Driver-aware SQL — same dashboard works on SQLite and Postgres
dashboard.txt text
— Spend (last 30 days) ——————————————
   $1,847.20    total
    -$182.40    saved by cache (9.9%)
    +$214.30    vs. last 30d (13.1% ↑)

— By provider —
   anthropic    $1,402.10   76%
   openai         $312.60   17%
   together        $98.40    5%
   groq            $34.10    2%

— Top endpoints (last 7 days) —
   /summarise-ticket          $412.30
   /agent-proxy-claude        $287.60
   /classify-intent            $94.20

— Top tokens (last 7 days) —
   pg_live_xy7…  customer-A    $312.40
   pg_live_qz3…  internal-bg   $187.20
   pg_live_ab8…  agent-team-2  $156.10
how-pricing-resolves.yaml yaml
# Pricing data is built into the gateway, kept current per release.
# Every request logs prompt_tokens + completion_tokens; the dashboard
# multiplies by the right model rate at query time.

provider: anthropic
model:    claude-3-5-sonnet
prompt_tokens:     14_320
completion_tokens:    642
# → ($3 × 14.32) + ($15 × 0.642)  per million
#   $0.04296 input + $0.00963 output = $0.05259

# Cache hits log with cost = $0 + a savings counter.
# When the cache returns a response, "what the call would have cost"
# is recorded so the dashboard can show savings honestly.

cache_hit:                 true
prompt_tokens_avoided:  14_320
completion_tokens_avoided: 642
# → savings: $0.05259
HOW IT WORKS

Built on the request log, not a separate pipeline.

Every request through the gateway already logs prompt_tokens, completion_tokens, the provider, the model, and the calling token. The cost dashboard is just SQL aggregation over that log, multiplied by the built-in pricing table the gateway keeps up to date with each release.

No separate billing system, no scheduled CSV imports, no third-party metering. The dashboard you see is the same data the audit log shows — just re-shaped.

For streaming responses, the gateway parses the final usage frame from each provider (OpenAI stream_options.include_usage, Anthropic message_delta.usage, Cohere message-end, Google usageMetadata) and records exact token counts — not estimates.

WHAT GETS TRACKED

Five lenses on the same data.

The same gateway log feeds five views — pick whichever answers the question you actually have.

By provider

Total + percentage split across OpenAI, Anthropic, Google, Cohere, Mistral, Groq, Together, Ollama. Trend per provider over time.

By endpoint

Which endpoints are eating the budget. Often the most actionable view — usually one or two endpoints account for >50% of spend.

By token

Per-API-token spend — find the "one client that's hammering it". Combined with token tags (post-launch) for team-level views.

By model

Spend per provider model. Useful when routing rules send different traffic to different models — see if your rules are doing what you think.

Cache savings

Explicit "saved by cache" line — what the response cache prevented you from spending. Justifies the cache TTLs.

30-day trend

Daily total chart. Picks up burn-rate increases days before the invoice does.

Find the spend before it finds you.

Plus per-endpoint budgets that 422 the request before the provider call when the cap is hit, and anomaly alerts that webhook when burn-rate jumps.

Coming soonComing soon Anomaly alerts →