How to detect lost Stripe webhooks
Most teams do not discover lost Stripe webhooks from monitoring. They discover them from angry customers.
- "I paid but my account is still locked"
- "I canceled last week and was billed again"
- "Refund completed in Stripe, but your app still shows active"
By the time support sees this, the real issue already happened: an event was generated in Stripe, but your system did not process it correctly.
This guide gives you a fast, repeatable way to detect lost webhooks before they become revenue leakage.
What "lost webhook" actually means
A webhook can be "lost" in several ways:
- Stripe generated the event, but your endpoint never received it.
- Your endpoint received it, but returned a 4xx/5xx and retries never recovered.
- Your endpoint returned 200, but your handler failed internally after acknowledgment.
- The event was processed, but state mutation failed (for example, DB timeout).
From a business perspective, all four lead to the same outcome: Stripe and your product state diverge.
Step 1: define critical events and expected state transitions
Start with a short table of events that directly affect money and access.
| Stripe event | Expected app action |
| --- | --- |
| checkout.session.completed | provision account / grant plan |
| invoice.paid | extend subscription period |
| invoice.payment_failed | mark account at risk / trigger dunning |
| customer.subscription.deleted | revoke or downgrade access |
| charge.refunded | apply refund state and entitlement rules |
If your team cannot state this mapping clearly, detection will always be noisy.
Step 2: compare Stripe truth vs app truth daily
Run a scheduled reconciliation job (at least daily, ideally every 15 minutes) that asks:
- Did Stripe emit a critical event?
- Do we have a matching internal mutation?
- If yes, how long did it take?
At minimum, persist these fields for each webhook attempt:
- event id (
evt_...) - event type
- response status
- delivery timestamp
- processing result
- trace id / correlation id
Without this data, every incident becomes guesswork.
Step 3: alert on divergence, not just on HTTP failures
HTTP 500 alerts are useful but insufficient.
Add divergence alerts such as:
- "Stripe cancellation found, but local subscription still active after 10 min"
- "Stripe payment succeeded, but no entitlement granted"
- "Refund event processed in Stripe, but invoice state unchanged"
This catches silent failure modes where your endpoint still returns 200.
Step 4: measure the dollar impact
Engineering alerts get ignored. Revenue impact gets prioritized.
For each unresolved divergence, estimate impact in cents and aggregate:
- total at risk
- recovered amount
- unresolved critical count
- top divergence types by value
Once leadership sees "we have $12,400 at risk from missed payment webhooks," these bugs move from backlog to roadmap.
Step 5: close the loop with replay
Detection without recovery creates more toil.
For every divergence, you should be able to:
- inspect original payload and delivery attempts,
- replay the exact event to your endpoint,
- track whether replay resolved the issue.
Replay is how you convert observability into recovered revenue.
A practical baseline you can ship this week
If you need a concrete first milestone, ship this:
- persist raw webhook payload + headers before processing,
- store attempt status and latency,
- run a scheduled reconciliation for critical Stripe events,
- alert when Stripe state and app state diverge,
- support one-click replay.
That baseline is enough to prevent most "we had no idea this was broken" incidents.
Final checklist
- [ ] Critical event mapping documented
- [ ] Durable webhook ingestion enabled
- [ ] Reconciliation job running on schedule
- [ ] Divergence alerts configured
- [ ] Replay workflow tested end to end
- [ ] Revenue impact reported weekly
If you want this without building custom infra first, Revenue Recovery Autopilot gives you scanner + monitor + recovery workflows on top of your Stripe webhook flow, so your team can detect and resolve revenue divergences fast.
Start here: https://katsuralabs.com
Revenue Recovery Autopilot will detect broken webhooks that cost you money. Join the early access waitlist.