Notifications carry validity, not values. A poke says "the frontier is at or beyond X" — never the data itself.
Restoring Truth and Sanity to Webhooks
Webhooks should be a poke with a hint and a ranged pull. Truth lives in a pullable ledger; sanity is a cursor you own. Everything else — signatures, retries, idempotency keys, replay consoles — is the cost of pretending delivery is a source of truth.
The contract is a cursor, not a delivery. The consumer owns one durable integer; the provider owns a readable ledger.
Every failure mode degrades to latency, never to corruption. Lost, duplicated, and reordered pokes are all harmless.
Move the contract from delivery to cursor.
The provider's side gets boring: best-effort fan-out plus a range-read endpoint it needed anyway. One design, internal and external.
Signed content-free poke
A best-effort notification that something changed: an account ID, a stream name, optionally a frontier hint. Nothing in it is load-bearing.
GET /changes?since=<cursor>
An authoritative range-read over an append-only ledger: ordered, bounded pages, opaque cursors, and a stated replay window.
Heartbeat floor
Consumers poll anyway every N minutes. A consumer that missed every poke self-heals without anyone operating a replay console.
Consumer-owned checkpoint
One row per consumer. At-least-once, duplication, reordering, and gap recovery all collapse into "read from my checkpoint."
What payload-bearing webhooks actually cost.
The ecosystem default optimizes for the first hour of integration and pays for it forever after. Systems designed by their first demo choose push-with-payload; systems designed by their failure modes choose poke and pull.
Delivery becomes the correctness mechanism
The moment the notification carries the data, you owe signatures, ordering guarantees, per-consumer retry queues, idempotency keys, dead-letter handling, and a replay console — a product surface that exists only because the webhook is trying to be a source of truth while in flight.
The disclaimer is an unpriced liability transfer
"Here's webhooks, but don't rely on them" makes the consumer build both halves: the full push receiver and the reconciliation poller. The provider saves one redesign; every consumer pays the reconciliation engineering independently, usually after their first incident.
The payload's value goes negative
Follow the providers' own checklists — dedupe, tolerate reordering, re-fetch the object instead of trusting the embedded one — and the payload contributes nothing. It is pure attack surface plus a false sense of completeness, retained because JSON in the body demos well.
The workaround is the confession
Every serious shop puts a single receiver in front of an internal bus, re-materializing the provider's ledger locally to recover the offset-pull properties the provider had and declined to expose — seeded through the lossy channel, so it inherits the gaps.
Everyone at scale already converged here.
The pattern has quietly won at every provider that operates webhooks at real scale — which is the tell.
Content-free webhooks ("something changed for these accounts") plus a cursor-based delta pull.
Push notifications carry essentially nothing; you call changes.list with your page token.
Migrated payload-bearing transaction webhooks to /transactions/sync; docs describe the webhook as a signal to call sync now.
GET /v1/events is a pullable, cursor-paginated, ~30-day event ledger — shipped, but framed as the fallback while the lossy channel is framed as the product.
Cursor pull with replay IDs over gRPC.
Consumer-owned offsets everywhere; the "push" people perceive is a long-poll wakeup.
Append-only pages with stable URLs — a pullable ledger, fifteen years early. Nobody built the consumer runtime for it either.
Offset pull is multi-consumer by construction.
Each consumer's entire existence, from the producer's perspective, is one integer the producer doesn't even know about.
Permissionless consumers
A new consumer is just a reader with credentials. The producer's cost is O(1) in consumer count — ranged reads of immutable history are the most cacheable workload that exists. You can put a CDN in front of an event log. Try CDN-ing a webhook.
Consumer isolation
A slow consumer is just behind; nobody else can tell. No provider-side retry queues backing up, no "we deregistered your endpoint because it failed too often."
History through the same channel
A webhook subscription starts at now; bootstrapping needs a separate backfill API and a gapless stitch. An offset consumer bootstraps by starting at zero. Same channel, same code path, no seam.
Replay as an operation, not a project
Rebuild after a bug: rewind the cursor. Test a new implementation: run it side-by-side at its own offset and diff the outputs. There is no webhook-shaped version of blue-green consumption.
Three questions, in order.
The hybrid is not universal. Imperatives, lossy telemetry, sub-round-trip feeds, and ephemeral presence each want a different grain — the rubric tells you which.
Is there an authoritative versioned store the consumer could read?
No → push, with durability matched to the message: imperatives get durable, acked delivery; samples get fire-and-forget. Commands have no current value to re-pull — if the poke is lost, the intent is lost.
Is the notification reducible to a monotone fact?
A cursor, a version, a dirty bit. Yes → poke + pull; monotone facts form a lattice — duplicates coalesce, reorders take the max, losses are subsumed by any later poke or heartbeat. The channel can now be maximally cheap and unreliable.
Is floor latency tolerable when the channel fails?
Yes → done. No → still build poke + pull, then spend money hardening only the poke channel — the cheap half, because hardening it cannot corrupt anything.
Ship the stateful half of the SDK.
Every provider ships the stateless client — the easy 20% — and abandons consumers at exactly the part that breaks in production. The fix is infrastructure modules, not library code: ~200 lines of IaC plus a handler stub, in three reference stacks (AWS, GCP, plain Postgres).
Poke receiver
Stateless endpoint or partner event-bus source. Its entire job is triggerPollNow().
Cursor row
One durable record per consumer in Dynamo, Postgres, or wherever state already lives.
Ranged-pull worker
A Lambda or container that reads /changes from the checkpoint, in order, in bounded pages.
Fan-out + DLQ
Per-stream or per-type dispatch to queues or a bus, with the dead-letter path that push made mandatory now merely optional.
Doctrine drafted. Spec next.
doctrineDrafted — this page, distilled from field notes on poke + ranged-pull systems.
changes-endpoint specSketch — the tiny RFC: opaque cursors, ordered bounded pages, stated replay window, signed content-free poke.
consumer runtime modulesRoadmap — Terraform/CDK reference stacks for AWS, GCP, and Postgres-in-a-container.