Data Engineering7 min read18 April 2026

Data Normalization for CRM Contacts: Fixing the Mess Before It Gets Worse

A practical guide to normalizing contact data in your CRM — phone numbers, names, emails, addresses — and the tools that make it automatic.

H

Haroon Mohamed

AI Automation & Lead Generation

Why data normalization matters

Contact data in CRMs gets inconsistent fast. Different forms, different reps, different lead sources, different imports — each creates its own formatting conventions.

The damage:

  • Duplicates that don't look like duplicates (+1 555 123 4567 vs. 555-123-4567)
  • Broken SMS sends (phone numbers in wrong format for Twilio)
  • Failed email campaigns (uppercase emails rejected by some ESPs)
  • Unusable reports (50 variations of "New York" making analysis impossible)

Normalization fixes all of this. Here's how.


Phone number normalization

The gold standard: E.164 format (+1XXXXXXXXXX for US, +44XXXXXXXXXX for UK, etc.)

Why E.164: it's the international standard, required by Twilio, supported by every modern telephony system.

Common incoming formats:

  • (555) 123-4567
  • 555-123-4567
  • 555.123.4567
  • 5551234567
  • 1-555-123-4567
  • +1 555 123 4567

All should normalize to +15551234567.

Implementation options

GoHighLevel: Native phone format is E.164. Use the "Phone" field type, not a generic text field. Validate on form submission.

Make.com:

{{replace(replace(replace(replace(phone; "("; ""); ")"; ""); "-"; ""); " "; "")}}

Then prepend country code if missing.

n8n (Code node, JavaScript):

const digitsOnly = phone.replace(/\D/g, '');
const withCountryCode = digitsOnly.length === 10 ? '+1' + digitsOnly : '+' + digitsOnly;

Libraries: For heavy-duty normalization, use libphonenumber-js (validates and formats based on country).


Name normalization

Common problems:

  • All uppercase ("JOHN SMITH")
  • All lowercase ("john smith")
  • Extra spaces ("John Smith")
  • Leading/trailing spaces (" John Smith ")
  • Unicode weirdness ("Müller" → "M?ller" if encoding fails)

Target format: Title Case, trimmed, single spaces.

Implementation

Make.com:

{{capitalize(lower(trim(firstName)))}}

(Capitalize only capitalizes the first letter. For full title case, use a custom function.)

Proper title case (JavaScript):

name.trim()
    .replace(/\s+/g, ' ')
    .toLowerCase()
    .replace(/\b\w/g, c => c.toUpperCase());

Gotcha: names with particles (van der, de la, O'Brien). Simple title case breaks these. For most B2B/B2C, it's good enough. For sensitive international contexts, use a name parser library.


Email normalization

Emails are case-insensitive by RFC 5321, but some systems treat them as case-sensitive. Best practice: always lowercase emails before storing.

Additional normalization:

  • Trim whitespace
  • Strip +tags from Gmail addresses if you want to deduplicate (user+promo@gmail.com = user@gmail.com)
  • Strip dots from Gmail addresses (u.s.e.r@gmail.com = user@gmail.com) — but be careful, this is Gmail-specific

Deduplication math

If you have 10,000 contacts and 3% are duplicates from email case differences, that's 300 duplicate contacts. Sending campaigns to both = wasted sends + spam risk + confused customers.

Implementation

const normalized = email.trim().toLowerCase();

For Gmail deduplication (optional, only if you really want to combine user+x@gmail.com with user@gmail.com):

function normalizeGmail(email) {
  const lower = email.trim().toLowerCase();
  const [local, domain] = lower.split('@');
  if (domain !== 'gmail.com' && domain !== 'googlemail.com') return lower;
  const withoutTag = local.split('+')[0];
  const withoutDots = withoutTag.replace(/\./g, '');
  return `${withoutDots}@gmail.com`;
}

Address normalization

Addresses are the hardest to normalize. Free-text fields produce infinite variations.

Common problems:

  • Abbreviations vs. spellings ("St." vs. "Street", "Ave" vs. "Avenue")
  • State codes vs. full names ("CA" vs. "California")
  • Zip code variations ("10001" vs. "10001-1234")
  • Country inconsistencies ("USA" vs. "United States" vs. "US")

Implementation options

USPS Address Standardization API: Free for US addresses. Normalizes to USPS standard.

Google Geocoding API: $200/month free credit, then $5/1000 requests. Handles global addresses.

Smarty (SmartyStreets): Paid ($50+/month). Most accurate commercial option.

GoHighLevel approach: Split into structured fields — address1, address2, city, state, postal_code, country. Validate at form submission with dropdowns for state/country.

For most small businesses: structured fields with dropdowns at form level + lowercase + trim. Skip address standardization APIs unless USPS accuracy matters for mailing.


State/Country normalization

Use ISO codes as storage format:

  • States: Two-letter postal codes (CA, NY, TX)
  • Countries: ISO 3166-1 alpha-2 (US, GB, CA, AU)

Display in UI with full names; store as codes. Mapping is straightforward.

GoHighLevel:

Custom field type "Dropdown" with ISO codes as values and full names as display labels.

Make.com mapping:

Use a Data Store module with a state/country mapping table, or hardcode in a function.


Source attribution normalization

"Source" is where a contact came from (Facebook, Google, referral, etc.). Reps and forms introduce variations:

  • "Facebook" / "facebook" / "FB" / "Meta Ads"
  • "Google" / "google ads" / "Adwords" / "SEO"
  • "Referral" / "referral" / "client referral" / "friend"

Fix: controlled vocabulary

Define 10-20 canonical source values. Store in a dropdown. Never allow free-text source entry.

Example canonical sources:

  • facebook-ads
  • google-ads
  • organic-search
  • referral-customer
  • referral-partner
  • linkedin
  • direct
  • email-campaign

Every contact gets one. If a source doesn't fit, add a new canonical value (don't create variations).


Automating normalization

At form submission (best)

Normalize on the way in. GHL's form builder, HubSpot's forms, Typeform — all let you set field types that force format (phone as phone, email as email).

Via webhook middleware (when forms can't)

If you're ingesting data from external sources (CSV imports, third-party webhooks), route through Make.com or n8n and normalize there before writing to the CRM.

Periodic cleanup (for existing data)

Scheduled workflow: every Sunday, run through all contacts updated in the past week. Normalize phone, email, name. Write back.


The audit process

Quarterly:

  1. Export all contacts
  2. Check phone format consistency (should be 100% E.164 or all same format)
  3. Check email case consistency (should be all lowercase)
  4. Check name formatting (should be title case)
  5. Count source variations — should be your canonical 10-20, not 50
  6. Identify duplicates (same email case-insensitive, same normalized phone)

If the audit reveals mess: fix the normalization process. Don't just clean the data — the mess will come back next quarter if the intake process is unchanged.


Real example: deduplicating after a CSV import

Scenario: your team imported 5,000 contacts from a lead list. 8% are duplicates of existing records.

Step 1: normalize all phone numbers (new + existing)

Run the normalization pass. Now duplicates surface.

Step 2: find duplicates by normalized phone + email

SQL query (if using Supabase):

SELECT normalized_phone, COUNT(*) 
FROM contacts 
GROUP BY normalized_phone 
HAVING COUNT(*) > 1;

Step 3: merge duplicates

Keep the record with the most recent activity. Copy fields from the older record if newer has gaps. Delete the older.

Step 4: prevent future duplicates

Add unique constraint on normalized_phone and normalized_email in your database. Fail imports that would create duplicates.


Common mistakes

1. Normalizing only on display, not on storage. If your UI lowercases emails but the database stores them mixed-case, searches by email will fail.

2. Skipping normalization because "we'll fix it later." You won't. Bad data compounds. Fix at intake.

3. Over-normalizing and losing information. Stripping Gmail +tags might be useful for deduplication but loses the tag data. Store both the original and the normalized version if the original matters.

4. Normalizing in application code instead of at the database layer. Inconsistent across apps. Enforce at the database via triggers or computed columns.


Sources

This post draws from publicly documented best practices for data normalization (ITU-T E.164 for phone numbers, RFC 5321 for emails, USPS Publication 28 for US addresses, ISO 3166-1 for country codes). Implementation examples are standard patterns used across Make.com, n8n, and custom code deployments.

Need help auditing and normalizing your CRM data? Let's talk — a typical normalization cleanup is a 1-2 day engagement.

Need This Built?

Ready to implement this for your business?

Everything in this article reflects real systems I've built and operated. Let's talk about yours.

H

Haroon Mohamed

Full-stack automation, AI, and lead generation specialist. 2+ years running 13+ concurrent client campaigns using GoHighLevel, multiple AI voice providers, Zapier, APIs, and custom data pipelines. Founder of HMX Zone.

ShareShare on X →