AI Voice5 min read4 April 2026

VAPI Pricing Explained: How Every Component of Your Bill Is Actually Calculated

A transparent breakdown of VAPI's per-minute billing model — platform fee, LLM tokens, TTS audio, STT processing, and telephony — so you can predict costs before you scale.

H

Haroon Mohamed

AI Automation & Lead Generation

VAPI bills per-minute — but that's not the whole story

When you first deploy a VAPI agent, you look at the pricing page and see "$0.05/min platform fee." You think: at 1,000 minutes/month, that's $50. Easy.

Then your first invoice arrives and it's $600.

The reason: VAPI's platform fee is one of five cost components that make up every minute of call time. The other four are pass-through costs that VAPI doesn't set — but still shows up on your bill.

Here's each component, what it actually costs at current public rates, and where you have control over the number.


Component 1: VAPI platform fee

Current rate (per VAPI's public pricing, 2026): $0.05/minute

This is VAPI's own margin. It covers the infrastructure that orchestrates your call — routing audio between the LLM, STT, TTS, and telephony provider.

Control: None. This is baked in.

Optimization: Long-term, this is the one cost you can't drive down. Everything else is tunable.


Component 2: LLM inference

Current rates (OpenAI, April 2026):

  • GPT-4o: $2.50 / 1M input tokens, $10.00 / 1M output tokens
  • GPT-4o Mini: $0.15 / 1M input tokens, $0.60 / 1M output tokens

Current rates (Anthropic):

  • Claude Sonnet 4.5: $3 / 1M input, $15 / 1M output
  • Claude Haiku 4.5: $1 / 1M input, $5 / 1M output

How it adds up: Every turn of conversation sends the full conversation history plus system prompt to the LLM. A 4-minute call typically consists of 15–25 turns. If your system prompt is 1,500 tokens and each turn adds ~200 tokens of history, the cumulative input tokens for the call can reach 30,000–60,000.

Example calculation:

  • 4-minute call with GPT-4o
  • 50,000 input tokens + 3,000 output tokens across the conversation
  • Cost: (50,000 × $2.50/1M) + (3,000 × $10/1M) = $0.125 + $0.030 = $0.155/call
  • Per minute: $0.04/min

Same call with GPT-4o Mini: ~$0.003/min — a 90% reduction.

Control: Full. You choose the model in your VAPI agent config.


Component 3: Text-to-Speech (TTS)

Current rates (April 2026, approximate):

  • ElevenLabs: $0.18 per 1,000 characters (flagship tier)
  • Cartesia Sonic: $0.025 per 1,000 characters
  • Rime AI: $0.04 per 1,000 characters
  • Azure Neural: $0.016 per 1,000 characters
  • PlayHT: $0.05 per 1,000 characters

How it adds up: TTS is billed per character of output audio, not per minute of call. A 4-minute call where the AI speaks for 90 seconds typically outputs 1,500–2,500 characters of text.

Example calculation (2,000 characters):

  • ElevenLabs: 2,000 × $0.00018 = $0.36/call = $0.09/min
  • Cartesia: 2,000 × $0.000025 = $0.05/call = $0.0125/min
  • Azure: 2,000 × $0.000016 = $0.032/call = $0.008/min

That's a 10x range between the cheapest and most expensive provider for the exact same call length.

Control: Full. You select the TTS provider in VAPI config.


Component 4: Speech-to-Text (STT)

Current rates (April 2026):

  • Deepgram Nova-3: $0.0043/minute (streaming)
  • AssemblyAI Universal: $0.0037/minute
  • OpenAI Whisper (via API): $0.006/minute
  • Google Cloud STT: $0.016/minute (enhanced model)

Control: Full. VAPI supports all major providers.

Note: STT is the smallest cost component for most calls. The difference between the cheapest and most expensive is usually only $0.01–$0.02 per minute. Not worth obsessing over.


Component 5: Telephony

Current rates (April 2026):

  • Twilio US phone number: $1.15/month rental
  • Twilio US outbound call: $0.014/minute + per-segment for SMS
  • Twilio Toll-free: $2.00/month rental + $0.019/min outbound
  • VAPI native telephony (if used): $0.03/min (higher, but simpler)

Additional costs most people miss:

  • A2P 10DLC brand registration (Twilio): $4 one-time + $10/month per campaign
  • SHAKEN/STIR registration: included in most Twilio plans but required for volume
  • CNAM display (your business name on caller ID): $5–$15/month per number

Control: Partial. You choose the provider, but the per-minute rate is set by carriers.


Putting it all together

A 4-minute call with an optimized stack in 2026:

| Component | Cost | |-----------|------| | VAPI platform | $0.20 | | LLM (GPT-4o Mini) | $0.012 | | TTS (Cartesia) | $0.05 | | STT (Deepgram) | $0.017 | | Telephony (Twilio) | $0.056 | | Total | $0.335/call | | Per minute | $0.084/min |

Same call with default/premium settings:

| Component | Cost | |-----------|------| | VAPI platform | $0.20 | | LLM (GPT-4o) | $0.16 | | TTS (ElevenLabs) | $0.36 | | STT (Deepgram) | $0.017 | | Telephony (Twilio) | $0.056 | | Total | $0.793/call | | Per minute | $0.198/min |

At 10,000 minutes/month, that's $840 vs $1,980 — a $1,140/month difference for the same functional output.


The pricing-page trap

When evaluating VAPI vs. competitors (Retell, Bland, Vocode), don't compare platform fees. Compare the full stack cost at your usage volume with your chosen providers. A platform with a $0.07/min fee but better default pricing on LLM/TTS can easily be cheaper overall than one with a $0.04/min fee and expensive defaults.

Always build a spreadsheet with your actual config before choosing a platform.


Where to verify these numbers

All rates above are pulled from each provider's public pricing page as of April 2026. Verify them yourself:

  • VAPI: vapi.ai/pricing
  • OpenAI: openai.com/api/pricing
  • Anthropic: anthropic.com/pricing
  • Cartesia: cartesia.ai/pricing
  • ElevenLabs: elevenlabs.io/pricing
  • Deepgram: deepgram.com/pricing
  • Twilio: twilio.com/pricing

Prices change frequently. Check these pages before building a business case.

Want help modeling the full cost stack for your specific use case? Get in touch — I've done this spreadsheet exercise for 13+ client deployments.

Need This Built?

Ready to implement this for your business?

Everything in this article reflects real systems I've built and operated. Let's talk about yours.

H

Haroon Mohamed

Full-stack automation, AI, and lead generation specialist. 2+ years running 13+ concurrent client campaigns using GoHighLevel, multiple AI voice providers, Zapier, APIs, and custom data pipelines. Founder of HMX Zone.

ShareShare on X →