alerts Feature

The webhook that fires before a person notices.

Every five minutes, the detector looks at error rate, p95 latency, and spend per endpoint. When the current value sits outside the recent distribution by more than 3.5 × MAD, the configured webhook fires. No thresholds to tune, no baselines to seed.

Coming soonComing soon Alert docs ↗

WHY THIS MATTERS

Static thresholds are wrong for AI traffic.

Pick a threshold like "alert if error rate > 5%" and you're either over-alerting (on a quiet endpoint where 5% is one failed request out of twenty) or under-alerting (on a busy one where 4.9% is hundreds of failed calls). Tune per endpoint and you have N tuning problems forever.

The detector skips the threshold step entirely. It looks at each endpoint's own recent distribution — last 24 hours of 5-minute buckets — and asks: "is the current bucket meaningfully outside the typical range for this endpoint?" If yes, alert. If no, stay quiet. Each endpoint calibrates itself.

No per-endpoint thresholds to tune
Quiet endpoints don't drown out the busy ones
Detector adapts as traffic patterns change
Three independent signals — error rate, p95 latency, spend

how-mad-works.txt text

— MAD = median absolute deviation —

# Last 24h of 5-minute buckets for an endpoint:
# error_rate per bucket = [0.01, 0.02, 0.0, 0.01, 0.02, ...]
median = 0.01
MAD    = median(|x - median|)  = 0.01

# Current bucket:
current = 0.085

# Is the current value far from the median, scaled by MAD?
distance = (current - median) / MAD     = 7.5  ← big

# Threshold for "anomalous": 3.5 × MAD above median.
# 7.5 > 3.5  →  fire the webhook.

# Why MAD, not standard deviation?
#   - Robust to outliers (one spike doesn't recalibrate the baseline)
#   - Works on small samples (5-min buckets, not millions of points)
#   - Doesn't assume the data is Gaussian

webhook-payload.json json

{
  "event": "endpoint.anomaly",
  "fired_at": "2026-05-09T13:42:00Z",
  "data": {
    "endpoint": {
      "slug": "summarise-ticket",
      "uuid": "e7c2…"
    },
    "signal": "error_rate",
    "current":    0.085,
    "median":     0.01,
    "mad":        0.01,
    "distance":   7.5,
    "threshold":  3.5,
    "window":     "5m",
    "baseline_window": "24h",
    "sample_size":     288,
    "top_errors": [
      {"status": 503, "count": 42},
      {"status": 429, "count": 18}
    ]
  }
}

WHAT FIRES

HMAC-signed webhook with the full picture.

The webhook payload includes the actual values the detector looked at, not just "something's wrong" — current value, median, MAD, the distance, and the top error categories that contributed. Whoever's on call gets enough to know whether to wake someone up or wait it out.

Three signals fire independently:

error_rate — non-2xx as a fraction of total. Catches provider outages, rate limits, schema regressions.
latency_p95 — 95th-percentile response time per bucket. Catches slow upstreams and stuck streams.
spend — USD per bucket. Catches runaway agent loops, leaked tokens, cache drops.

Webhooks are HMAC-SHA256 signed with the per-webhook secret; same delivery infrastructure as the rest of PromptGate's webhooks (signed, retried with backoff, delivery log per attempt).

Notice the spike, not the invoice.

Pair anomaly alerts with budgets — alerts surface the trend, budgets hard-stop before the bill compounds.

Coming soonComing soon Cost dashboard →