Blog post

Idempotency in event-driven systems: how I think about it

A practical guide to idempotency in event-driven systems, including retries, duplicate delivery, state changes, and why small teams should design for it early.

Nov 28, 2025 3 min read aws event-driven serverless architecture small-teams

Event-driven systems become much easier to love once the happy path works.

They become much harder to operate once retries, partial failures, and duplicate delivery start showing up in real workloads.

That is why I think idempotency is one of the first design questions that matters in event-driven systems, not one of the last cleanup tasks.

Idempotency is not optional once retries exist

If a system retries work, duplicate execution is already part of the design whether the team has modeled it clearly or not.

That means the question is not whether a handler might run twice. The question is whether running twice creates damage, confusion, or expensive cleanup.

For small teams, this is important because the operational cost of ambiguity gets expensive quickly. A system that “usually works” but occasionally creates duplicate side effects is one of the hardest kinds of systems to trust.

Start with side effects, not theory

I do not start by asking whether the function or consumer is technically idempotent. I start by asking what damage repeated execution can cause.

That usually means looking at side effects like:

duplicate emails or notifications
repeated billing actions
repeated state transitions
inventory or quota changes happening twice
downstream events emitted more than once

Those are the places where idempotency needs to be deliberate.

Good idempotency design makes repeated work harmless

The goal is not to prevent all retries. The goal is to make repeated execution safe enough that retries stop being scary.

That usually means one or more of these patterns:

recording processed event identifiers
using natural business keys where duplicates can be recognized
making state transitions conditional
checking whether the intended result already exists before doing the work again
structuring side effects so they can be replayed safely

The exact technique matters less than the mindset. Repetition should not create a new outcome when the system is processing the same intent.

Idempotency boundaries should match ownership boundaries

This is one place where event-driven systems often get fuzzy.

Teams talk about idempotency as if it is a generic infrastructure concern, but in practice it usually lives at the boundary where a service owns a side effect or state change.

That means the team should be able to say:

what outcome this consumer owns
how it recognizes duplicate intent
what it considers “already done”
how that decision is persisted

If those answers are vague, the idempotency model is probably too vague too.

Retries and idempotency belong together

I do not think about retries and idempotency separately.

Retry behavior defines how often repeated execution may happen. Idempotency defines whether repeated execution is safe.

That is why I want them designed together:

how often will this retry?
under what conditions?
what is safe to repeat?
when should the system stop retrying and surface failure?

Without that pairing, teams often end up with aggressive retries and weak idempotency, which is one of the fastest ways to make a background system noisy and expensive.

The real danger is partial success

This is where teams usually get surprised.

The hardest failures are often not full failures. They are partial ones.

The consumer writes one record, emits one event, or updates one subsystem, then fails before it finishes the rest. The retry then arrives and the system has to decide what “already processed” really means.

That is why I like designs where the intended outcome is explicit and where the team has a clear way to detect whether that outcome has already been committed.

If the system cannot answer that clearly, retries become much harder to trust.

My default advice on idempotency

If a system is event-driven, assume retries and duplicate execution are normal conditions, not edge cases.

Design idempotency around the side effects that matter.

Keep the ownership boundary clear.

Make repeated execution harmless wherever possible.

And treat retry policy and idempotency as one design problem, not two.

That is what keeps event-driven systems from turning into workflows that only feel safe when nothing goes wrong.

Contact

Working on AI, cloud, or platform modernization?

If you are hiring, shaping a project, or need an experienced technical sounding board, use the contact form and send a little context.

Contact me