Designing a Resilient Webhook Consumer

Webhooks are the glue of the modern web. When an event happens in Stripe, GitHub, or Shopify, they send an HTTP request to your server to notify you.

While sending webhooks is relatively straightforward, consuming them reliably at scale is a common engineering challenge. If your server is down, slow, or returning errors, you might miss critical business events.

Here is how to design a webhook consumer that never drops an event.

1. Decouple Reception from Processing

The absolute golden rule of webhook design is: Do not process webhooks synchronously on the main thread.

If a webhook endpoint takes 5 seconds to process (e.g., parsing data, saving to a database, and emailing a user), you will quickly run out of database connections or server threads during peak traffic. Furthermore, if your processing fails halfway, the provider might retry the webhook, causing duplicate work.

Instead:

  1. Accept the webhook payload.
  2. Quickly validate the signature.
  3. Queue the payload into a message broker (like Redis, RabbitMQ, or SQS).
  4. Immediately return a 200 OK or 202 Accepted response.
  5. Process the queue asynchronously using background workers.

2. Make Your Handlers Idempotent

Webhook providers (like Stripe) operate on an at-least-once delivery guarantee. This means you will occasionally receive the exact same webhook event twice.

To prevent double-billing or duplicate email sends:

  • Every webhook payload contains a unique event ID.
  • Store processed webhook IDs in a database or cache with a unique constraint.
  • Before processing any webhook, check if the ID has already been marked as complete.

3. Handle Provider Retries Gracefully

If your worker fails to process the event, you should retry internally rather than relying on the external webhook provider to retry.

  • Build a retry queue with exponential backoff.
  • Monitor the queue size and alert developers if webhooks are failing to process for more than an hour.

By separating reception from processing, implementing strict idempotency keys, and setting up internal queues, you ensure your webhook system can withstand heavy traffic spikes and network failures without losing data.