AI Glossary
Token cost is simply the price you pay per million tokens (roughly 750,000 words) when using an AI model — and it can vary by 10x or more depending on which model you choose.
What it really means
When you send a prompt to an AI model like GPT-4 or Claude, you’re not paying by the request — you’re paying by the token. A token is a chunk of text, roughly 4 characters or 0.75 words in English. Every word you send in (input) and every word the AI generates (output) gets counted as tokens, and you’re billed per million tokens.
Think of it like a water bill. You don’t pay per faucet turn — you pay per gallon. Token cost is your per-gallon rate. The difference is that some models charge $0.15 per million input tokens (cheap, fast models like GPT-4o Mini) while others charge $15 per million (powerful reasoning models like o1). That’s a 100x spread for the same basic task.
I’ve seen business owners get blindsided by this. They test a model on a few prompts, it works great, so they roll it out to their whole customer service team — and the bill jumps from $20 to $2,000 overnight. Understanding token cost upfront is how you avoid that surprise.
Where it shows up
Token cost appears in two places on your invoice:
- Input cost — what you pay for the prompt you send in (including system instructions, conversation history, and the user’s question)
- Output cost — what you pay for the AI’s response (usually 2-4x more expensive per token than input)
Most AI providers publish their pricing as “per million tokens.” For example, GPT-4o might list $2.50 per million input tokens and $10 per million output tokens. That means a 500-word email (roughly 670 tokens) costs about $0.0017 in input and $0.0067 in output — pennies per email. But if you’re running 10,000 emails a day, that’s $84 daily or $2,500 monthly.
Different models have different cost structures. Smaller, faster models (like GPT-4o Mini or Claude Haiku) are cheap. Large reasoning models (like o1 or Claude Opus) are expensive. And specialized models for images or audio have their own per-token rates.
Common SMB use cases
Here’s how token cost plays out for real Central Florida businesses:
- A Winter Park dental practice uses AI to draft appointment reminder emails and answer FAQs. Each patient interaction uses maybe 300 tokens. At $0.15 per million tokens, their monthly bill is under $5.
- An HVAC company in Maitland has a chatbot on their website that handles 200 customer inquiries a day. Each conversation averages 2,000 tokens (including history). That’s 400,000 tokens daily, or 12 million monthly. At $2.50 per million input, that’s $30/month — manageable.
- A law firm in downtown Orlando uses AI to summarize depositions and draft contract clauses. Each document is 10,000+ tokens, and they process 50 a week. That’s 2 million tokens monthly, but they use a premium model at $15 per million — so $300/month for a task that saves them 40 hours of paralegal time.
- A Lake Nona restaurant tried using AI to generate weekly social media posts. Each post is 200 tokens, so 800 tokens monthly. At any model’s pricing, that’s less than a penny. But the real cost is the time spent editing the output — not the tokens.
The pattern is clear: for most SMBs, token cost is negligible until you hit high volume. The real expense is usually in setup and oversight, not the per-token bill.
Pitfalls (what gets oversold)
Here’s what I’ve seen trip people up:
- “It’s free to start.” Many providers offer a free tier with limited tokens. Once you exceed that, the meter starts running fast. A dental practice I worked with burned through their $5 free credit in two days testing a chatbot and didn’t realize it.
- “Just use the cheapest model.” Cheap models are fine for simple tasks like drafting emails. But for complex reasoning (legal analysis, medical triage, financial calculations), they produce errors that cost you more in rework than you saved on tokens.
- “Output cost doesn’t matter.” It does. If your AI generates long responses (like full reports or detailed analysis), output tokens can be 5-10x your input tokens. A 100-word prompt might generate a 2,000-word response — and you’re paying more for the output.
- “I’ll just buy the unlimited plan.” Provider “unlimited” plans often have hidden caps or throttling. One auto shop in Sanford signed up for a $200/month “unlimited” plan, hit 50,000 tokens in a day, and got rate-limited to one response per minute.
- “Tokens are the only cost.” They’re not. You also pay for API calls, storage, and sometimes per-user licensing. Token cost is just one line item.
The biggest oversell I hear: “AI is practically free.” It is free for small tests. But at scale — 10,000 customer interactions a month — token cost becomes a real line item you need to budget for.
Related terms
- Context window — The maximum number of tokens a model can process at once. Larger windows mean higher input costs per request.
- Token limit — The cap on how many tokens a model can generate in a single response. Hitting this can truncate your output.
- Prompt engineering — Writing prompts that use fewer tokens (shorter, more precise) to reduce cost without losing quality.
- Model tier — The category of model (mini, standard, premium) that determines both capability and token cost.
- API pricing — The full cost structure including token cost, plus any per-request fees or data transfer charges.
Want help with this in your business?
If you’re trying to figure out what your actual token costs would look like for your business, I’m happy to run the numbers with you — just email me or use the contact form on this site.