Automation Error Handling: Why Silent Failures Are Your Biggest Risk
Most automation breakdowns happen silently. Here's how to build error handling into Make.com, n8n, and Zapier workflows so you catch problems before they cost you.
Haroon Mohamed
AI Automation & Lead Generation
The problem with "it just works"
Most automations are built, tested with 1-2 records, and declared done. They run quietly for weeks or months. Then one day:
- You realize 30% of leads from last month never got welcome emails.
- A client calls asking why they never got a quote three weeks ago.
- Your calendar sync has silently failed for 45 days.
- Your CRM is missing 500 contacts from a failed webhook batch.
Nobody noticed because nobody built error handling. The automation stopped working — silently — and the business kept running on faith.
Error handling is the discipline that makes automation reliable at scale.
Types of automation failures
Silent failures
The automation runs but does nothing. No error is raised. You only notice when expected outcomes don't happen.
Examples:
- Webhook delivered but content is empty — automation triggers, does nothing useful
- API call returns 200 but with
{"success": false}in body — technically successful, actually failed - Null field breaks a downstream step, causing the step to skip without error
Loud failures
The automation throws an obvious error. Most tools have error logs for these.
Examples:
- API rate limit hit (429 error)
- Invalid credentials (401 error)
- Missing required field
Degraded failures
The automation partially succeeds. Some records processed, some didn't.
Examples:
- Bulk update fails halfway through — first 50 contacts updated, remaining 200 skipped
- One step in a 10-step flow times out — 9 steps succeeded, 1 failed
Each type needs different handling.
The 5 layers of error handling
Layer 1: Input validation
Stop bad data before it enters the pipeline.
- Required fields check
- Format validation (email looks like email, phone is E.164)
- Range checks (deal amount isn't negative)
- Existence checks (referenced record actually exists)
If validation fails, branch to an error path instead of processing.
Layer 2: Retry logic
For transient failures (network timeouts, rate limits), retry with backoff.
Exponential backoff:
- First retry: wait 2 seconds
- Second retry: wait 4 seconds
- Third retry: wait 8 seconds
- After N retries: give up and escalate
Most tools support retry natively. Make.com has "Process errors" option with retry. n8n has retry-on-error. Zapier has built-in retry for most apps.
Layer 3: Fallback paths
When the primary path fails, use a backup.
Example: Lead enrichment workflow:
- Try Apollo for contact data
- If Apollo fails, try Clearbit
- If Clearbit fails, try Hunter
- If all fail, log to "Manual Review" sheet and continue
Layer 4: Error logging and alerting
Record every failure. Alert humans when meaningful.
Log to:
- Make.com's execution history (built-in, limited retention)
- Google Sheet or Supabase (permanent, queryable)
- Slack message to #automation-alerts
Alert when:
- Error rate exceeds threshold (e.g., >5% of runs fail)
- Critical automation fails even once (lead routing, payment processing)
- Cumulative failures in a day exceed normal baseline
Layer 5: Dead letter queue
For records that can't be processed after retries, queue them for human review.
Implementation:
- Supabase table:
failed_records (id, workflow_name, payload_json, error_message, created_at, resolved_at) - Every failure: INSERT into this table
- Admin UI to review and either retry or mark as resolved
- Daily alert if queue is non-empty
Implementation in Make.com
Error handlers
Every module can have an error handler. Right-click the module → "Add error handler." This creates a branch that runs if the module fails.
Common error handler patterns:
Pattern 1: Log and continue
Error from API call → Log to Google Sheet → Ignore (continue scenario)
Pattern 2: Retry with commit/rollback
Error from API call → Wait 30 seconds → Retry → If still error, escalate
Pattern 3: Alert and stop
Critical error → Slack alert to admin → Scenario stops
Break vs. Commit vs. Resume
Make's error handler options:
- Resume: continue the scenario, skip the failed module
- Break: stop the scenario entirely
- Commit: write out partial results before stopping
- Rollback: reverse any partial writes (for modules that support it)
For most automation: use "Resume" for non-critical errors, "Break" for critical errors that shouldn't continue without fix.
Scenario-level error notifications
Make → Scenario settings → "Receive a notification if the scenario encounters an error." Gets an email on failure. Basic but essential.
Implementation in n8n
Error Trigger node
n8n has a special "Error Trigger" node. Create a separate workflow that runs only when another workflow fails. The error workflow receives details about the failure and can send alerts, retry, or log.
Setup:
- Create Error Workflow with Error Trigger node
- Add Slack/email node to notify
- In each production workflow: Settings → "Error Workflow" → select your error workflow
- Any failure in the production workflow triggers the error workflow
Retry on error
n8n node settings → "Retry On Fail" → set max retries, wait between retries. Handles transient failures automatically.
Try-catch with IF nodes
For custom logic, wrap the risky operation in a Code node with try-catch:
try {
const result = await $helpers.httpRequest({
method: 'POST',
url: '...',
body: { ... }
});
return [{ json: { success: true, data: result } }];
} catch (error) {
return [{ json: { success: false, error: error.message } }];
}
Then branch downstream on success === true.
Implementation in Zapier
Zapier's error handling is weaker than Make/n8n, but workable.
Path logic
Use "Paths" to branch on outcome. Conditional logic lets you route based on success/failure of earlier steps.
Error notification
Zapier → Settings → Notifications → Email on failure. Basic but essential.
Sub-zaps for retry
Create a secondary zap that handles "failed records" queue. Main zap writes to queue on error; sub-zap retries from queue hourly.
Premium: Storage
Zapier Storage lets you persist values across zap runs. Use it for:
- Idempotency (store processed IDs)
- Failure queues (store failed records with retry count)
What to monitor
Error rate
% of runs that fail. Baseline it in first 2 weeks. Alert when >2x baseline.
Execution time
Runs that take much longer than baseline indicate problems (API slowdown, rate limits, data volume shifts).
Throughput
Expected events per hour/day. If a normally-busy webhook is silent for 4 hours, alert — something might be broken upstream.
Specific failure patterns
Same error 50 times in a row = not a transient issue. Needs attention.
Common error scenarios and handling
API rate limit (429)
Handle: Retry with exponential backoff. If still failing after retries, slow the upstream trigger or batch requests.
Authentication failure (401)
Handle: Stop retrying immediately (retry won't fix). Alert admin to refresh credentials.
Network timeout
Handle: Retry 2-3 times with short delay. If still failing, log and skip.
Data format error
Handle: Don't retry (won't fix). Log with payload so human can see what was malformed. Route to dead letter queue.
Missing required field
Handle: Validate at start. If missing, log and skip. Don't process incomplete data.
Duplicate record
Handle: Use UPSERT instead of INSERT. Treats duplicates as updates instead of errors.
Testing error handling
Most teams build happy-path automations and never test failure modes. Test error handling before production by:
- Disconnect an integration: revoke OAuth token, see if your alert fires
- Feed bad data: submit a form with invalid email, see if validation catches it
- Rate limit yourself: temporarily set a low API limit, see if retry logic works
- Delete required field: temporarily remove a field the automation needs
If your error handling works in all 4 scenarios, you're much better than most deployments.
The cost of no error handling
For a typical SMB with a few critical automations:
- 1% silent failure rate = 1 lead out of every 100 lost
- At $100 average deal value and 20% close rate, that's $20 lost per 100 leads
- At 500 leads/month = $100/month silently leaked
- Over a year = $1,200 in lost deals from failures nobody noticed
Multiply by multiple automations, multiply by higher deal values, and the cost of "automation just works" becomes serious.
Error handling investment: 4-8 hours per critical automation. ROI: obvious within months.
Sources
Error handling patterns are standard across engineering literature (Release It! by Michael Nygard, Site Reliability Engineering by Google). Tool-specific implementations verified against current documentation for Make.com, n8n, and Zapier. Pricing and feature details as of April 2026.
Need help auditing error handling in your existing automations? Let's talk — a 1-day engagement typically finds 3-10 silent failure points worth fixing.
Need This Built?
Ready to implement this for your business?
Everything in this article reflects real systems I've built and operated. Let's talk about yours.
Haroon Mohamed
Full-stack automation, AI, and lead generation specialist. 2+ years running 13+ concurrent client campaigns using GoHighLevel, multiple AI voice providers, Zapier, APIs, and custom data pipelines. Founder of HMX Zone.
Related articles
Team Capacity Automation: How to Auto-Assign Tasks Based on Workload
In small teams, task assignment is informal. The manager knows who's busy and who isn't. New tasks go to whoever has bandwidth. Things mostly balance. This stops working around 6-10 people. The manag…
Subscription Cancellation Automation: The Win-Back Sequences That Save Revenue
When a customer cancels, most operators treat it as a transactional event: process the cancellation, refund if needed, move on. The customer disappears from active rolls. Done. This is leaving substa…