Retry and Idempotency for Webhook Delivery

Webhook delivery is fundamentally at-least-once: the sender retries until it sees a success response, which means a receiver that processes an event but fails to reply in time will get the same event again. If your handler charges a card or sends an email on every delivery, those duplicates become double charges. The fix is a documented retry policy plus an idempotency key the consumer uses to deduplicate. This guide shows how to express both in OpenAPI 3.1 and build the consumer-side logic. It is part of the Webhook & Callback Definitions section and the broader OpenAPI & AsyncAPI Schema Authoring framework. By the end you will have a spec fragment documenting the contract and a receiver that processes each event exactly once.

Webhook retry and idempotent deduplication The sender retries a delivery with exponential backoff. The consumer looks up the idempotency key, skips the side effect on a duplicate, and acknowledges with 200 to stop further retries. Sender retry 1s,2s,4s… Idempotency-Key Consumer key seen? new → process apply side effect duplicate → skip no side effect both paths return 200 to stop retries

Problem & Context

Networks fail, processes restart, and timeouts fire. To deliver reliably, webhook senders adopt at-least-once semantics: keep retrying a delivery on a backoff schedule until the receiver returns a success code or the retry budget runs out. That guarantee is valuable, but it pushes a hard requirement onto the consumer. Consider the race that bites every team eventually: your handler inserts an order, commits, then the load balancer kills the connection before the 200 reaches the sender. The sender, seeing no acknowledgment, retries — and now you have two orders for one event.

The only robust answer is idempotency. The sender attaches a stable key that is identical across every retry of the same event, and the consumer records which keys it has already applied. The first time a key arrives, the handler does the work and stores the key; every subsequent time, it recognizes the key and returns success without repeating the side effect. This converts at-least-once delivery into effectively exactly-once processing.

The spec is where this contract is published so subscribers know the retry schedule (and therefore how long to retain keys) and which header carries the idempotency key. As with securing webhooks with signature verification, OpenAPI cannot model retry behavior in its core objects, so the policy lives in a header parameter and an x- extension that your portal renders.

Step-by-Step Solution

1. Document the retry policy in the spec

Use an x-webhook-retry extension on the operation to publish the machine-readable policy, and restate it in the description for the portal. Document which response codes trigger a retry — non-2xx and timeouts retry; 4xx (except 429) do not.

# openapi.yaml (OpenAPI 3.1.0)
openapi: 3.1.0
info:
  title: Orders API
  version: 1.0.0
webhooks:
  order.created:
    post:
      operationId: onOrderCreated
      summary: Fired when an order is created
      description: >
        Deliveries are retried with exponential backoff (1s, 2s, 4s, 8s, …,
        capped at 1h) for up to 8 attempts over ~24h. Return 2xx to acknowledge.
        Any 5xx, 429, or timeout triggers a retry; other 4xx responses are
        treated as permanent failures and are not retried. Each retry carries
        the same Idempotency-Key, so process each key exactly once.
      x-webhook-retry:
        strategy: exponential
        base-delay-seconds: 1
        max-delay-seconds: 3600
        max-attempts: 8
        retry-on: [408, 429, 500, 502, 503, 504]
      parameters:
        - $ref: '#/components/parameters/IdempotencyKey'
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/OrderEvent'
      responses:
        '200':
          description: Event acknowledged (new or duplicate)
        '500':
          description: Transient failure; sender will retry

2. Declare the Idempotency-Key header as a reusable parameter

components:
  parameters:
    IdempotencyKey:
      name: Idempotency-Key
      in: header
      required: true
      description: >
        Stable identifier for this event, identical across all retries.
        Matches the id field in the payload. Consumers must deduplicate on it.
      schema:
        type: string
        format: uuid
  schemas:
    OrderEvent:
      type: object
      required: [id, type, created, data]
      properties:
        id: { type: string, format: uuid }
        type: { type: string, const: order.created }
        created: { type: integer, format: int64 }
        data:
          type: object
          properties:
            orderId: { type: string }
            total: { type: integer }

Validate the spec parses:

npx @redocly/cli@2 lint openapi.yaml

Expected output:

validating openapi.yaml...
openapi.yaml: valid
Woohoo! Your API description is valid. 🎉

3. Persist processed keys with a unique constraint

The deduplication store must enforce uniqueness atomically. A unique index turns “have I seen this key?” into a single insert that either succeeds (new) or conflicts (duplicate) — no read-then-write race.

CREATE TABLE processed_webhooks (
  idempotency_key TEXT PRIMARY KEY,
  processed_at    TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Expire keys older than the retry window (run periodically):
DELETE FROM processed_webhooks WHERE processed_at < now() - INTERVAL '72 hours';

4. Deduplicate before applying side effects

Attempt the insert first. ON CONFLICT DO NOTHING makes the check atomic: if zero rows were inserted, the key was already processed and you skip the work.

async function markProcessed(client, key) {
  const result = await client.query(
    'INSERT INTO processed_webhooks (idempotency_key) VALUES ($1) ' +
    'ON CONFLICT (idempotency_key) DO NOTHING',
    [key]
  );
  return result.rowCount === 1; // true = first time, false = duplicate
}

Do the dedup insert and the business write in the same database transaction so a crash between them cannot leave a recorded key with no order (or an order with no recorded key).

5. Return the right status codes

Return 200 for both new and duplicate events so the sender stops retrying. Return 500 only for transient failures you want retried; return 400 for malformed payloads you will never accept, so the sender stops wasting attempts. Send a delivery, then resend it with the same key:

KEY=$(uuidgen)
for i in 1 2; do
  curl -s -o /dev/null -w "attempt $i -> %{http_code}\n" \
    -X POST http://localhost:3000/webhooks/orders \
    -H 'Content-Type: application/json' \
    -H "Idempotency-Key: $KEY" \
    -d '{"id":"'"$KEY"'","type":"order.created","created":1,"data":{"orderId":"o_1","total":500}}'
done

Expected output (processed once, acknowledged twice):

attempt 1 -> 200
attempt 2 -> 200

The server logs show processing order o_1 exactly once.

Complete Working Example

A self-contained server.js using Express and an in-memory store (swap the Map for the SQL table above in production). Run with node server.js.

// server.js — run: node server.js   (Node 18+, express only)
const express = require('express');
const app = express();
app.use(express.json());

// In production replace this Map with the processed_webhooks table.
const seen = new Map(); // key -> processedAt (ms)
const RETENTION_MS = 72 * 60 * 60 * 1000; // 72h, covers the retry window

function isDuplicate(key) {
  const at = seen.get(key);
  if (at && Date.now() - at < RETENTION_MS) return true;
  return false;
}

app.post('/webhooks/orders', async (req, res) => {
  const key = req.get('Idempotency-Key');
  if (!key) {
    return res.status(400).json({ error: 'missing_idempotency_key' });
  }

  // Atomic-enough for a single process; use INSERT ... ON CONFLICT in a DB.
  if (isDuplicate(key)) {
    // Already applied — acknowledge so the sender stops retrying.
    return res.status(200).json({ received: true, duplicate: true });
  }

  try {
    const event = req.body;
    // --- business side effect happens exactly once ---
    console.log('processing order', event.data && event.data.orderId);
    // Record the key only AFTER the side effect succeeds.
    seen.set(key, Date.now());
    return res.status(200).json({ received: true, duplicate: false });
  } catch (err) {
    // Transient failure: 5xx tells the sender to retry with the same key.
    console.error('handler failed, will be retried', err);
    return res.status(500).json({ error: 'transient_failure' });
  }
});

app.listen(3000, () => console.log('listening on :3000'));

Send the same event twice as shown in step 5; the side effect runs once and both attempts return 200.

Gotchas & Edge Cases

Recording the key before the side effect succeeds. If you store the idempotency key first and the business write then fails, a retry will see the key, treat the event as done, and silently drop it. Always perform the side effect and record the key inside one transaction, or record the key only after the side effect commits — never before.

Retention shorter than the retry window. If the sender retries for 24 hours but you expire keys after 1 hour, a late retry looks new and gets reprocessed. Set the key TTL to at least the sender’s full retry budget plus a margin; the policy you publish in x-webhook-retry is what tells consumers how long that window is.

Returning 5xx for permanent failures. A malformed or unsupported payload returned as 500 will be retried for the entire window, wasting both sides’ resources and flooding your logs. Return 400 for anything you will never successfully process, and reserve 5xx strictly for transient errors you genuinely want retried.

FAQ

Why do webhook deliveries need an idempotency key if they are already retried?

Retries are exactly the reason an idempotency key is needed: a receiver can process an event and then fail to return 200 before the sender gives up and retries, so the same event arrives twice. The idempotency key lets the consumer recognize the duplicate and skip the side effect the second time, turning at-least-once delivery into effectively exactly-once processing.

Should the idempotency key be the event ID or a separate header?

Use a stable per-event identifier that never changes across retries, exposed both in the payload as an event id and in an Idempotency-Key header. The header lets a gateway or middleware deduplicate before the body is parsed, while the payload id remains the durable key your handler stores.

How long should a consumer remember processed idempotency keys?

Store each processed key at least as long as the sender’s total retry window plus a safety margin, commonly 24 to 72 hours. Keys older than the longest possible retry can be expired with a TTL because the sender will never redeliver them after that point.