AI Gateway
The most-opinionated of the five types. Each endpoint pins a provider + model + prompt and validates input/output. Sessions, streaming, schemas, failover, budgets — all per-endpoint.
Endpoints as first-class artefacts
Each AI endpoint carries every knob you'd want for a production-quality LLM call.
Pinned provider + model
One endpoint = one upstream model. Or pick a Provider Template if you want to share configuration across endpoints.
System prompt + template
A system prompt and an optional {{input}}-templated user prompt. Re-render with new context, not a new endpoint.
JSON Schema validation
Optional input + output JSON Schemas enforced via justinrainbow/json-schema. 422 on invalid output.
Streaming
Per-endpoint streaming_enabled flag. SSE pass-through with full provider semantics intact.
Sessions
Server-side conversation state with TTL, max-messages, max-tokens. Session UUID flows transparently through the gateway.
Failover chain
Try secondary credentials when the primary fails. Transparent to the client; logged with the routing decision.
Rate limits
Per-minute and per-hour caps. 429 + Retry-After header on breach. Cache-driver agnostic.
Budgets
Per-request token cap + monthly USD budget. 422 before the provider call when configured; null = unlimited.
Routing rules (YAML)
Pick provider/model based on input size, schema presence, monthly_spend_pct, time_of_day. First-match-wins; no match → endpoint default.
Call an AI Gateway endpoint
One slug per endpoint. The slug is the public path; the endpoint's pinned model + prompt + schema take care of everything else.
- System prompt baked in — clients only send a user message
- Output schema validated server-side
- Tokens, latency, status logged in
gateway_logs - Cost rolls into the project dashboard automatically
# POST to the endpoint slug curl -X POST https://promptgate.your.co/api/<uuid>/summarize \ -H "Authorization: Bearer pg_live_..." \ -H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "user", "content": "Long article goes here..."} ] }' # → returns the validated, schema-constrained response
Pick this type when…
Use AI Gateway when
- You ship a real product feature with one or more LLM-backed endpoints
- You want to enforce a system prompt + schema on every call
- You need sessions, failover, budgets, or rate limits per endpoint
- You expose specific endpoints to specific clients (different scopes per token)
Pick something else when
- Your app uses an OpenAI SDK and you just want to swap the base URL → Agent Proxy or
ai_wrapper - You want to proxy a non-LLM HTTP API → API Gateway
- You want to aggregate multiple MCP servers behind one endpoint → MCP Gateway
Ready to ship?
Pull the image, create your first AI Gateway project, and define a single endpoint. The wizard takes you through every knob in seven tabs.