Data Engineering9 min read23 June 2026

Idempotency Keys: The Pattern That Stops Duplicate Records in Their Tracks

Duplicate records are the most common data quality problem in automation stacks. Idempotency keys prevent them at the source. Here's the pattern explained for non-developers — and where to apply it.

Haroon Mohamed

AI Automation & Lead Generation

The duplicate problem

Open most CRMs that have been running for a year or more and you'll find duplicates everywhere. Same person, multiple contacts. Same deal, multiple opportunities. Same form submission processed twice. Each instance contributes a small amount of data dirt; cumulatively, the data becomes unreliable.

Most of these duplicates aren't users making mistakes. They're systems behaving correctly under conditions where the same event gets delivered twice — webhook retries, network timeouts, automation retries, browser refreshes. Without protection against this, every retry produces a duplicate.

The fix is a pattern called idempotency. It's a standard concept in software engineering, and it's directly applicable to no-code automation work — but rarely implemented because most builders haven't been taught it.

This post is the practical version: what idempotency means, why duplicates happen, and how to apply the pattern using tools you already have.

What "idempotent" actually means

A function or operation is idempotent if calling it multiple times produces the same result as calling it once.

Examples:

"Set the temperature to 70" is idempotent. Run it once or run it ten times — final temperature is 70.
"Increase the temperature by 1" is not idempotent. Running it ten times adds 10, not 1.

For automation work, the relevant version: when an event (lead submission, payment, form fill, webhook) arrives multiple times, the system handles it as one event, not many.

Why duplicates happen

Several common causes, all of which are normal system behavior:

Webhook retries. Many webhook providers retry delivery if they don't receive a 200 response within a few seconds. Your system might have actually received and processed the webhook, but if the response was slow, the provider retries. You process it twice.

Browser refresh on form submit. Someone submits a form, the response is slow, they refresh, and the form submits again. Now you have two leads.

API timeouts. An automation calls an API, the call times out, the automation retries. The API actually executed the original call but the response didn't come back. Now the action happens twice.

Manual re-runs. Someone re-runs a workflow that already partially ran. Without idempotency, every action that already happened happens again.

Multiple integration paths. A lead source pushes data via webhook AND polls an endpoint. Both deliver the same data. Without dedup, you get two copies.

These aren't edge cases. They happen continuously in any production automation stack.

The idempotency key pattern

The standard solution: every event carries a unique identifier. The system tracks which identifiers it has already processed. When an event arrives, the system checks: have I already processed this? If yes, skip. If no, process it and remember the identifier.

This identifier is called the idempotency key. It can be:

A unique ID generated by the source system (a Stripe payment ID, a form submission ID)
A composite of meaningful attributes (email + timestamp rounded to the minute)
A hash of the event payload

The exact form doesn't matter. What matters is that the same logical event always produces the same key.

Implementing idempotency in no-code automations

Here's the practical pattern for Make.com, Zapier, n8n, or similar tools.

Step 1: Identify the idempotency key for your event.

For each automation, decide what makes an event "the same event" if it arrives twice. Common choices:

For webhooks from third-party services: use their unique event ID if provided
For form submissions: use the submission ID if the form provider includes one
For payment events: use the transaction ID
For generic events: hash the relevant fields (email + form_name + timestamp_minute)

Step 2: Maintain a "seen" log.

A simple data store of idempotency keys you've already processed. Options:

A Google Sheet with a column for keys
An Airtable base
A dedicated table in your CRM
A small Redis or database if you're more technical

The data store needs two things: ability to check if a key exists, and ability to add a new key.

Step 3: At the start of each automation, check the log.

The first action in your workflow:

Compute or extract the idempotency key from the event
Check the seen log for that key
If found, exit early — log "duplicate ignored" if you want
If not found, continue with the workflow
At a successful end of the workflow, write the key to the seen log

This is 2-3 extra modules per automation. Cheap insurance.

Step 4: Set retention on the log.

The log shouldn't grow forever. For most workflows, retaining keys for 30-90 days is enough — duplicate retries don't typically arrive months later. Older entries can be archived or deleted.

A specific example: form submission duplicate prevention

Concrete walk-through.

Setup: Your website has a "Request a Quote" form. The form provider sends webhooks to your Make.com scenario when submissions occur.

The duplicate risk: The form provider retries failed webhooks for up to 24 hours. Your scenario takes 8 seconds to run; sometimes the response is slow and the provider retries. Without idempotency, every slow run produces a duplicate lead.

The implementation:

The form provider includes a submission_id field in the webhook payload — unique per submission.
Your scenario's first step: lookup submission_id in an Airtable table called "Processed Submissions."
If found: log "duplicate ignored" and exit.
If not found: continue with the normal workflow — create the lead in CRM, send notification, fire off automation.
As the last step: insert the submission_id into the "Processed Submissions" table with a timestamp.

Result: even if the form provider retries the webhook 5 times, only the first run creates a lead. The other 4 see the existing key and exit early.

When natural keys don't exist

Some events don't have a built-in unique identifier. In these cases, build a synthetic key:

Email + form name + minute timestamp:

hash(email + form_name + floor(timestamp / 60))

If the same email submits the same form within the same minute, treat it as a duplicate. This catches accidental double-clicks without preventing intentional re-submissions hours later.

Email + amount + day:

For payment-style events: same email, same amount, same day → likely the same transaction.

The trick is choosing fields specific enough to identify true duplicates but loose enough to handle expected variation (e.g., timestamp differing by milliseconds shouldn't make events different).

Idempotency at the destination, not just the source

The pattern above prevents your automation from running twice. There's a complementary pattern: making the destination operations idempotent.

If your automation creates a CRM contact, instead of "create contact," use "create-or-update contact based on email." The operation is now idempotent at the destination — running it twice doesn't create two contacts.

Most modern CRMs and APIs support this:

HubSpot's "create or update" patterns
GoHighLevel's contact dedup on email
Stripe's idempotency keys (genuinely a built-in idempotency-key feature you should always use)
Airtable's upsert operations

When the destination supports idempotent operations natively, use them. They're more reliable than maintaining your own seen-log.

What about partial duplicates?

A more subtle case: events that aren't strict duplicates but are conceptually the same.

Example: a lead fills out the same form twice with slightly different information (corrected typo, different phone number). Each submission has a unique submission_id, so neither is a "duplicate" by ID. But you don't want two CRM contacts.

The solution is layered:

At the workflow level: idempotency keys prevent webhook retries from creating duplicates
At the data level: dedup logic merges contacts that are likely the same person (matching email or phone)

These are separate concerns. Don't try to combine them into one mechanism.

Common mistakes

Using timestamps as idempotency keys. Timestamps differ between retries. Using them as keys means every retry looks like a different event, defeating the purpose.

Not handling the case where the seen-log write fails. If you process the event but fail to write the key, the next retry will reprocess. Either accept this risk (rare and recoverable) or write the key first and the operation second (with rollback if the operation fails).

Maintaining keys forever. A 5-year-old key is wasted storage. Implement retention.

Forgetting to apply the pattern to retries inside automations. If your workflow has a step that retries on failure, that step also needs idempotency, or one transient failure produces multiple downstream effects.

Trusting idempotency keys from sources that don't actually guarantee them. Some webhook providers reuse "unique" IDs across retries; others rotate them. Verify behavior before relying on it.

Where to apply this first

Prioritize idempotency where the cost of duplicates is highest:

Payment processing. A duplicate charge produces customer complaints and chargebacks. Always idempotent.
Lead intake. Duplicate contacts pollute the CRM. High priority.
Order/booking creation. A duplicate booking confuses scheduling. High priority.
Notifications. A duplicate Slack message is annoying but recoverable. Medium priority.
Reporting events. A duplicate report generation is wasteful but rarely user-visible. Lower priority.

Apply idempotency to the high-stakes workflows first. The lower-stakes ones can wait until the pattern is well-understood.

The compounding payoff

Implementing idempotency feels like extra work for marginal benefit. The payoff isn't visible immediately because most events don't actually duplicate.

Over time, the payoff is data quality. CRMs that started with idempotent intake have clean contact records years later. CRMs that didn't are full of duplicates that operators eventually try to clean up — usually badly, often making it worse.

Pay the small cost upfront. The data integrity dividend runs for the lifetime of the system.

If you want help building idempotency-aware automations and clean data infrastructure, let's talk.

Need This Built?

Ready to implement this for your business?

Everything in this article reflects real systems I've built and operated. Let's talk about yours.

Build My System See Live Results â†’

Haroon Mohamed

Full-stack automation, AI, and lead generation specialist. 2+ years running 13+ concurrent client campaigns using GoHighLevel, multiple AI voice providers, Zapier, APIs, and custom data pipelines. Founder of HMX Zone.

ShareShare on X â†’

Data Engineering8 min read

Time-Series Data for Marketing Analytics: When PostgreSQL Beats a Real TSDB

Time-series data is data with a timestamp where the timestamp matters. Every event has a "when," and you analyze across the time dimension constantly. For marketing analytics, this is most of the dat…

26 Jun 2026Read →