AI Voice8 min read8 October 2025

AI Calling Agents: How We Cut Call Costs by 77% Without Touching Quality

A detailed breakdown of how I reduced AI calling costs from $1.50/min to $0.35/min through prompt optimisation, provider switching, and conversation flow tuning — without losing call quality.

H

Haroon Mohamed

AI Automation & Lead Generation

The cost that was eating the margin

AI calling agents are powerful. They're also expensive if you build them naively.

When we first deployed a VAPI-based qualification agent for a solar client, the cost was sitting at $1.50/minute. On 200 calls per day at an average call length of 4 minutes, that's $1,200/day — $36,000/month — just for call infrastructure.

That's before Twilio, before GoHighLevel, before any other tool cost. That's just the AI call processing.

After three months of systematic optimisation, the cost was $0.35/minute. Same call quality. Same qualification accuracy. Same conversion rate from call to booked appointment.

$0.35 vs $1.50 on 200 calls/day × 4 minutes = $900/day saved. $27,000/month. $324,000/year.

Here's exactly how we got there.


Understanding where the cost comes from

VAPI (and most AI calling platforms) bill per minute based on the components in your stack:

  1. LLM inference cost — The language model processing the conversation (GPT-4, Claude, etc.)
  2. TTS (text-to-speech) cost — Converting the AI's text response to audio (ElevenLabs, Cartesia, etc.)
  3. STT (speech-to-text) cost — Converting the caller's speech to text (Deepgram, AssemblyAI, etc.)
  4. Platform fee — VAPI's own per-minute fee on top
  5. Telephony cost — Twilio or VAPI's own calling numbers

The default VAPI setup uses:

  • GPT-4 (most expensive LLM tier)
  • ElevenLabs (expensive TTS with premium voices)
  • Deepgram (reasonable, but at default tier)
  • VAPI platform fee (~$0.05/min)
  • Twilio telephony

Adding all these up at default settings gives you $1.20–$1.80/minute. The range depends on conversation length and LLM response length.

The opportunity: every single component is configurable. And the cheapest option in each category often has equivalent quality for a structured qualification use case.


Optimisation #1: Switch the LLM

The biggest single cost reduction came from switching the LLM.

From: GPT-4 (~$0.03/1K tokens input, $0.06/1K tokens output) To: GPT-4o Mini (~$0.00015/1K tokens input, $0.0006/1K tokens output)

That's a 100x reduction in LLM costs.

The concern with switching to a smaller model: will the AI agent be dumber? Will it misunderstand leads? Will it go off-script?

The answer, for a structured qualification call, is: not meaningfully.

A qualification call has a fixed script. The AI asks 8 predetermined questions. It categorises responses into buckets (Yes/No, numeric ranges, multiple choice). It doesn't need to reason deeply about novel situations. It needs to execute a structured conversation reliably.

GPT-4o Mini does this perfectly well. The conversation feels identical to GPT-4. The qualification accuracy didn't drop.

Where you'd keep GPT-4: If your AI agent needs to handle complex objections, make nuanced judgment calls, or adapt significantly to unexpected conversation directions. For structured qualification, the smaller model is sufficient.

LLM cost reduction: ~$0.60/min saved


Optimisation #2: Switch the voice provider

ElevenLabs has premium voice quality — genuinely impressive. But for a qualification call that's going to run at scale, you don't need premium.

From: ElevenLabs (premium tier, ~$0.30/min) To: Cartesia (~$0.04/min)

Cartesia's voices are good. Not ElevenLabs level, but for a professional business call, they're entirely appropriate. The test I use: would someone pause the conversation because the voice sounds robotic? With Cartesia at default settings: no.

We tested this on 50 real calls, comparing the exact same script with ElevenLabs vs. Cartesia. Completion rate, qualification rate, and caller satisfaction (measured by whether they stayed on the call and engaged) were statistically identical.

The voice is a smaller part of call quality than most people assume. The script, the pacing, and the relevance of the questions matter more than whether the TTS voice is "premium."

Alternative: Rime AI is another solid option at comparable pricing. Worth A/B testing against Cartesia for your specific use case.

TTS cost reduction: ~$0.26/min saved


Optimisation #3: Prompt engineering to reduce LLM token usage

This is often overlooked because it requires understanding how LLMs are billed.

LLMs charge per token — both input (your system prompt + conversation history) and output (the AI's responses).

The original system prompt was 1,800 tokens. It included:

  • Full company background
  • Detailed script with every possible response
  • Long instructions for edge cases
  • Examples for how to handle various objections

Every single API call during the conversation sent this entire 1,800-token prompt as context.

What we changed:

  1. Reduced the system prompt to 600 tokens — keeping only what the model actually needed to execute the call
  2. Removed examples (the model didn't need them for a structured script)
  3. Made the AI's responses shorter — instructed it to give concise, natural responses rather than verbose ones
  4. Moved static reference information (company details, pricing ranges) into a separate tool call that only fired when needed, rather than being in the system prompt always

Result: Average token usage per call dropped by ~55%. On a 4-minute call, that's significant.

Token reduction savings: ~$0.20/min saved


Optimisation #4: Conversation flow tuning to reduce call length

Call length × cost per minute = total cost. Reducing average call length from 5 minutes to 3.5 minutes (a 30% reduction) is equivalent to a 30% cost reduction at the same per-minute rate.

We analysed 200 call transcripts and found:

Time waster #1: Unclear transitions The original script had vague transitions like "Great, let me ask you a few more questions." This led to confused pauses, leads asking "sorry, what was that?" and re-asks. We tightened every transition to be direct: "Okay, next question—"

Time waster #2: AI over-explaining The AI was trained to be conversational, so it would say things like "That's great to hear! Many homeowners in [state] have found that..." before asking the next question. Filler. Removed.

Time waster #3: Re-confirming information unnecessarily At the end of the call, the original script had the AI repeat back 5 fields of information for confirmation. We cut this to the two most important fields (name and callback number) and moved the rest to an automatic SMS confirmation.

Time waster #4: Handling wrong-numbers incorrectly When someone said "I think you have the wrong number," the original AI tried to verify the lead's information anyway. This created a painful 45-second conversation before the AI gave up. We added explicit intent detection: if the lead expresses confusion about being called, end gracefully in 15 seconds.

Average call length reduction: 4.2 min → 3.1 min (-26%)

Call length cost reduction: ~$0.30/min equivalent


Optimisation #5: Smarter call scheduling

This one doesn't reduce per-minute cost — it reduces wasted calls entirely.

When we analysed our call data, we found:

  • Calls made Monday–Friday 10am–11am and 5pm–6:30pm local time had a 67% answer rate
  • Calls made Monday–Friday 2pm–4pm had a 31% answer rate
  • Weekend calls had a 22% answer rate

We weren't failing to connect because of the AI agent quality. We were burning call volume on bad time windows.

By restricting calling hours to the two peak windows (and adding Saturday 10am–12pm as a test), our answer rate improved from 44% to 61%.

Same number of calls. 39% more conversations. Fewer calls to voicemail (which still cost per-minute even when they don't answer).

Effective cost reduction: 30% fewer unproductive call minutes


The final cost breakdown

| Component | Before | After | Saving | |-----------|--------|-------|--------| | LLM | $0.65/min | $0.05/min | $0.60 | | TTS | $0.30/min | $0.04/min | $0.26 | | STT | $0.12/min | $0.08/min | $0.04 | | Platform | $0.18/min | $0.10/min | $0.08 | | Telephony | $0.05/min | $0.05/min | $0.00 | | Prompt optimisation | — | — | $0.17 effective | | Total | $1.50/min | $0.35/min | $1.15/min (77%) |

Note: The prompt optimisation savings are spread across LLM costs above and reflected in effective billing reduction.


What didn't change

Qualification accuracy. We tracked the rate at which leads qualified by the AI agent were confirmed as qualified by human closers. Before: 84%. After: 82%. Noise-level difference.

Caller experience. We surveyed a sample of leads post-call. No meaningful change in how the call was perceived.

Booking rate. Leads who passed qualification and were offered a calendar booking: consistent at 41%.

The entire optimisation was a pure cost reduction. The output — qualified leads delivered to closers — was identical.


The replication checklist

If you're running a VAPI-based AI calling operation and want to apply these:

  1. Audit your LLM selection. Are you using GPT-4 for a structured script? Switch to GPT-4o Mini or Claude Haiku. Test with 50 calls before committing.

  2. Audit your TTS provider. Are you on ElevenLabs? Test Cartesia or Rime. Run the same script on both, listen to 10 calls on each, compare.

  3. Trim your system prompt. Paste your current prompt into a token counter. If it's over 800 tokens for a qualification use case, you have bloat. Cut everything that isn't directly needed for the call.

  4. Analyse call transcripts for time wasters. Download 50 transcripts. Read them. You'll find the patterns quickly.

  5. Check your call timing data. What's your answer rate by hour of day and day of week? Restrict calling to the top two windows.

  6. Calculate your actual per-call cost. Most operators don't know this number. Calculate: (calls/day × avg call length × cost/min) = daily infrastructure cost. Make it visible.

The cost of running this properly matters. At scale, the difference between $1.50/min and $0.35/min is the difference between a sustainable operation and one that eats all its own margin.

Want help auditing and optimising your calling stack? Get in touch.

Need This Built?

Ready to implement this for your business?

Everything in this article reflects real systems I've built and operated. Let's talk about yours.

H

Haroon Mohamed

Full-stack automation, AI, and lead generation specialist. 2+ years running 13+ concurrent client campaigns using GoHighLevel, multiple AI voice providers, Zapier, APIs, and custom data pipelines. Founder of HMX Zone.

ShareShare on X →