route Feature

Smart routing, declarative rules.

Per-endpoint YAML rules decide which provider and which model gets the call — based on input size, schema presence, monthly spend, time of day, or any signal you wire up. Clients keep using one model name; the gateway picks the right backend.

Coming soonComing soon Routing docs ↗

WHY YOU'D WANT THIS

Smart cost control without rewriting every call site.

The economics of LLMs are bimodal: cheap models handle 80% of work fine, expensive models are necessary for the rest. The naive answer — "always use the expensive one" — bleeds money. The slightly-less-naive answer — "let each app pick" — pushes routing logic into every client.

Routing rules invert it: the gateway holds the policy. Clients call model: "smart" or model: "cheap"; the rules behind the alias decide what actually runs. Change the policy, every client immediately follows — without a single deploy on the app side.

Long inputs route to a 200K-context model; short ones to a fast one
JSON-schema enforcement on; route to a model that respects it
Monthly budget burning fast; downgrade after 80% spend
Off-hours; cheaper model since latency matters less

endpoint-routing.yaml yaml

# Per-endpoint rules — first match wins, fall through to default.
default:
  provider: anthropic
  model:    claude-3-5-sonnet

rules:
  # Long inputs → bigger context window
  - if: "input_tokens > 100000"
    use:
      provider: google
      model:    gemini-2.5-pro

  # Schema mode — must round-trip clean JSON
  - if: "has_schema"
    use:
      provider: openai
      model:    gpt-4o

  # Burn-rate guard — switch to cheaper model past 80% of monthly budget
  - if: "monthly_spend_pct > 0.8"
    use:
      provider: groq
      model:    llama-3.3-70b

  # Off-hours batch processing
  - if: "hour_local in [22, 23, 0, 1, 2, 3, 4, 5]"
    use:
      provider: together
      model:    deepseek-r1

AVAILABLE SIGNALS

What rules can match on.

Every routing decision lands in the audit log alongside the matched rule, so 'why did it pick that?' is always answerable after the fact.

`input_tokens`

Token count of the prompt before dispatch. Useful for context-window-aware routing.

`has_schema`

Boolean — true when the request includes a JSON schema. Route to schema-strict models.

`has_tools`

Boolean — true when tool definitions are passed. Route to tool-capable models.

`monthly_spend_pct`

Fraction of the endpoint's monthly USD budget already consumed. Burn-rate-aware downgrades.

`time_of_day` / `hour_local` / `weekday`

Time signals for cost or latency policies that vary across the day.

`streaming`

Boolean — true when the client requested SSE. Route streaming-capable models only.

Custom JSONPath signals

Pull any value from the request body — $.metadata.priority, $.user_tier — and match on it.

Failover chain

Independent of rules: every endpoint can list secondary credentials that kick in on 5xx / rate-limit. Logged with the routing decision.

Move the policy, not the apps.

Edit the rule on the gateway, every client sees it on the next request. Audit log records which rule matched on every call.

Coming soonComing soon All features →

Smart routing, declarative rules.

Smart cost control without rewriting every call site.

What rules can match on.

input_tokens

has_schema

has_tools

monthly_spend_pct

time_of_day / hour_local / weekday

streaming