Data Normalization for CRM Contacts: Fixing the Mess Before It Gets Worse
A practical guide to normalizing contact data in your CRM — phone numbers, names, emails, addresses — and the tools that make it automatic.
Haroon Mohamed
AI Automation & Lead Generation
Why data normalization matters
Contact data in CRMs gets inconsistent fast. Different forms, different reps, different lead sources, different imports — each creates its own formatting conventions.
The damage:
- Duplicates that don't look like duplicates (
+1 555 123 4567vs.555-123-4567) - Broken SMS sends (phone numbers in wrong format for Twilio)
- Failed email campaigns (uppercase emails rejected by some ESPs)
- Unusable reports (50 variations of "New York" making analysis impossible)
Normalization fixes all of this. Here's how.
Phone number normalization
The gold standard: E.164 format (+1XXXXXXXXXX for US, +44XXXXXXXXXX for UK, etc.)
Why E.164: it's the international standard, required by Twilio, supported by every modern telephony system.
Common incoming formats:
(555) 123-4567555-123-4567555.123.456755512345671-555-123-4567+1 555 123 4567
All should normalize to +15551234567.
Implementation options
GoHighLevel: Native phone format is E.164. Use the "Phone" field type, not a generic text field. Validate on form submission.
Make.com:
{{replace(replace(replace(replace(phone; "("; ""); ")"; ""); "-"; ""); " "; "")}}
Then prepend country code if missing.
n8n (Code node, JavaScript):
const digitsOnly = phone.replace(/\D/g, '');
const withCountryCode = digitsOnly.length === 10 ? '+1' + digitsOnly : '+' + digitsOnly;
Libraries: For heavy-duty normalization, use libphonenumber-js (validates and formats based on country).
Name normalization
Common problems:
- All uppercase ("JOHN SMITH")
- All lowercase ("john smith")
- Extra spaces ("John Smith")
- Leading/trailing spaces (" John Smith ")
- Unicode weirdness ("Müller" → "M?ller" if encoding fails)
Target format: Title Case, trimmed, single spaces.
Implementation
Make.com:
{{capitalize(lower(trim(firstName)))}}
(Capitalize only capitalizes the first letter. For full title case, use a custom function.)
Proper title case (JavaScript):
name.trim()
.replace(/\s+/g, ' ')
.toLowerCase()
.replace(/\b\w/g, c => c.toUpperCase());
Gotcha: names with particles (van der, de la, O'Brien). Simple title case breaks these. For most B2B/B2C, it's good enough. For sensitive international contexts, use a name parser library.
Email normalization
Emails are case-insensitive by RFC 5321, but some systems treat them as case-sensitive. Best practice: always lowercase emails before storing.
Additional normalization:
- Trim whitespace
- Strip
+tagsfrom Gmail addresses if you want to deduplicate (user+promo@gmail.com=user@gmail.com) - Strip dots from Gmail addresses (
u.s.e.r@gmail.com=user@gmail.com) — but be careful, this is Gmail-specific
Deduplication math
If you have 10,000 contacts and 3% are duplicates from email case differences, that's 300 duplicate contacts. Sending campaigns to both = wasted sends + spam risk + confused customers.
Implementation
const normalized = email.trim().toLowerCase();
For Gmail deduplication (optional, only if you really want to combine user+x@gmail.com with user@gmail.com):
function normalizeGmail(email) {
const lower = email.trim().toLowerCase();
const [local, domain] = lower.split('@');
if (domain !== 'gmail.com' && domain !== 'googlemail.com') return lower;
const withoutTag = local.split('+')[0];
const withoutDots = withoutTag.replace(/\./g, '');
return `${withoutDots}@gmail.com`;
}
Address normalization
Addresses are the hardest to normalize. Free-text fields produce infinite variations.
Common problems:
- Abbreviations vs. spellings ("St." vs. "Street", "Ave" vs. "Avenue")
- State codes vs. full names ("CA" vs. "California")
- Zip code variations ("10001" vs. "10001-1234")
- Country inconsistencies ("USA" vs. "United States" vs. "US")
Implementation options
USPS Address Standardization API: Free for US addresses. Normalizes to USPS standard.
Google Geocoding API: $200/month free credit, then $5/1000 requests. Handles global addresses.
Smarty (SmartyStreets): Paid ($50+/month). Most accurate commercial option.
GoHighLevel approach: Split into structured fields — address1, address2, city, state, postal_code, country. Validate at form submission with dropdowns for state/country.
For most small businesses: structured fields with dropdowns at form level + lowercase + trim. Skip address standardization APIs unless USPS accuracy matters for mailing.
State/Country normalization
Use ISO codes as storage format:
- States: Two-letter postal codes (CA, NY, TX)
- Countries: ISO 3166-1 alpha-2 (US, GB, CA, AU)
Display in UI with full names; store as codes. Mapping is straightforward.
GoHighLevel:
Custom field type "Dropdown" with ISO codes as values and full names as display labels.
Make.com mapping:
Use a Data Store module with a state/country mapping table, or hardcode in a function.
Source attribution normalization
"Source" is where a contact came from (Facebook, Google, referral, etc.). Reps and forms introduce variations:
- "Facebook" / "facebook" / "FB" / "Meta Ads"
- "Google" / "google ads" / "Adwords" / "SEO"
- "Referral" / "referral" / "client referral" / "friend"
Fix: controlled vocabulary
Define 10-20 canonical source values. Store in a dropdown. Never allow free-text source entry.
Example canonical sources:
facebook-adsgoogle-adsorganic-searchreferral-customerreferral-partnerlinkedindirectemail-campaign
Every contact gets one. If a source doesn't fit, add a new canonical value (don't create variations).
Automating normalization
At form submission (best)
Normalize on the way in. GHL's form builder, HubSpot's forms, Typeform — all let you set field types that force format (phone as phone, email as email).
Via webhook middleware (when forms can't)
If you're ingesting data from external sources (CSV imports, third-party webhooks), route through Make.com or n8n and normalize there before writing to the CRM.
Periodic cleanup (for existing data)
Scheduled workflow: every Sunday, run through all contacts updated in the past week. Normalize phone, email, name. Write back.
The audit process
Quarterly:
- Export all contacts
- Check phone format consistency (should be 100% E.164 or all same format)
- Check email case consistency (should be all lowercase)
- Check name formatting (should be title case)
- Count source variations — should be your canonical 10-20, not 50
- Identify duplicates (same email case-insensitive, same normalized phone)
If the audit reveals mess: fix the normalization process. Don't just clean the data — the mess will come back next quarter if the intake process is unchanged.
Real example: deduplicating after a CSV import
Scenario: your team imported 5,000 contacts from a lead list. 8% are duplicates of existing records.
Step 1: normalize all phone numbers (new + existing)
Run the normalization pass. Now duplicates surface.
Step 2: find duplicates by normalized phone + email
SQL query (if using Supabase):
SELECT normalized_phone, COUNT(*)
FROM contacts
GROUP BY normalized_phone
HAVING COUNT(*) > 1;
Step 3: merge duplicates
Keep the record with the most recent activity. Copy fields from the older record if newer has gaps. Delete the older.
Step 4: prevent future duplicates
Add unique constraint on normalized_phone and normalized_email in your database. Fail imports that would create duplicates.
Common mistakes
1. Normalizing only on display, not on storage. If your UI lowercases emails but the database stores them mixed-case, searches by email will fail.
2. Skipping normalization because "we'll fix it later." You won't. Bad data compounds. Fix at intake.
3. Over-normalizing and losing information. Stripping Gmail +tags might be useful for deduplication but loses the tag data. Store both the original and the normalized version if the original matters.
4. Normalizing in application code instead of at the database layer. Inconsistent across apps. Enforce at the database via triggers or computed columns.
Sources
This post draws from publicly documented best practices for data normalization (ITU-T E.164 for phone numbers, RFC 5321 for emails, USPS Publication 28 for US addresses, ISO 3166-1 for country codes). Implementation examples are standard patterns used across Make.com, n8n, and custom code deployments.
Need help auditing and normalizing your CRM data? Let's talk — a typical normalization cleanup is a 1-2 day engagement.
Need This Built?
Ready to implement this for your business?
Everything in this article reflects real systems I've built and operated. Let's talk about yours.
Haroon Mohamed
Full-stack automation, AI, and lead generation specialist. 2+ years running 13+ concurrent client campaigns using GoHighLevel, multiple AI voice providers, Zapier, APIs, and custom data pipelines. Founder of HMX Zone.
Related articles
Time-Series Data for Marketing Analytics: When PostgreSQL Beats a Real TSDB
Time-series data is data with a timestamp where the timestamp matters. Every event has a "when," and you analyze across the time dimension constantly. For marketing analytics, this is most of the dat…
Schema Migrations Without Downtime: How to Evolve Your CRM Database Safely
In a small operation, schema changes feel low-risk. You add a custom field. You rename a tag. You change a dropdown to a multi-select. The change works in the CRM UI and you move on. What you didn't …