ai_gateway Product type

AI Gateway

The most-opinionated of the five types. Each endpoint pins a provider + model + prompt and validates input/output. Sessions, streaming, schemas, failover, budgets — all per-endpoint.

Install Community Edition AI Endpoints docs ↗

WHAT'S IN THE BOX

Endpoints as first-class artefacts

Each AI endpoint carries every knob you'd want for a production-quality LLM call.

Pinned provider + model

One endpoint = one upstream model. Or pick a Provider Template if you want to share configuration across endpoints.

System prompt + template

A system prompt and an optional {{input}}-templated user prompt. Re-render with new context, not a new endpoint.

JSON Schema validation

Optional input + output JSON Schemas enforced via justinrainbow/json-schema. 422 on invalid output.

Streaming

Per-endpoint streaming_enabled flag. SSE pass-through with full provider semantics intact.

Sessions

Server-side conversation state with TTL, max-messages, max-tokens. Session UUID flows transparently through the gateway.

Failover chain

Try secondary credentials when the primary fails. Transparent to the client; logged with the routing decision.

Rate limits

Per-minute and per-hour caps. 429 + Retry-After header on breach. Cache-driver agnostic.

Budgets

Per-request token cap + monthly USD budget. 422 before the provider call when configured; null = unlimited.

Routing rules (YAML)

Pick provider/model based on input size, schema presence, monthly_spend_pct, time_of_day. First-match-wins; no match → endpoint default.

EXAMPLE

Call an AI Gateway endpoint

One slug per endpoint. The slug is the public path; the endpoint's pinned model + prompt + schema take care of everything else.

System prompt baked in — clients only send a user message
Output schema validated server-side
Tokens, latency, status logged in gateway_logs
Cost rolls into the project dashboard automatically

curl shell

# POST to the endpoint slug
curl -X POST https://promptgate.your.co/api/<uuid>/summarize \
  -H "Authorization: Bearer pg_live_..." \
  -H "Content-Type: application/json" \
  -d '{
        "messages": [
          {"role": "user", "content": "Long article goes here..."}
        ]
      }'

# → returns the validated, schema-constrained response

DECISION HELPER

Pick this type when…

Use AI Gateway when

You ship a real product feature with one or more LLM-backed endpoints
You want to enforce a system prompt + schema on every call
You need sessions, failover, budgets, or rate limits per endpoint
You expose specific endpoints to specific clients (different scopes per token)

Pick something else when

Your app uses an OpenAI SDK and you just want to swap the base URL → Agent Proxy or ai_wrapper
You want to proxy a non-LLM HTTP API → API Gateway
You want to aggregate multiple MCP servers behind one endpoint → MCP Gateway

Ready to ship?

Pull the image, create your first AI Gateway project, and define a single endpoint. The wizard takes you through every knob in seven tabs.

Install Community Edition AI Endpoints docs ↗