Tokens (AI)

AI Glossary

Think of tokens as the individual puzzle pieces an AI model uses to read and write — and the unit you’re billed for, usually 3-4 characters each.

What it really means

When you type something into an AI tool like ChatGPT or Claude, the model doesn’t read your words the way you do. It breaks everything down into tiny chunks called tokens. A token is typically about three or four characters — sometimes a whole word, sometimes just part of one. “Hello, world!” might become three tokens: “Hello” (one), “,” (one), ” world!” (one).

Here’s the part that matters for your wallet: you pay per token. Every input you send and every output the model generates adds up. If you’re using an API directly (not a flat-fee subscription), your monthly bill is basically: total tokens used × a per-token rate. For most small businesses, that’s pennies per task, but it can climb fast if you’re feeding long documents or running thousands of requests.

I’ve seen clients get confused because they think they’re paying “per word” or “per prompt.” Nope — it’s per token. A 500-word email might be 700 tokens. A 10-page contract could be 15,000 tokens. The model doesn’t care about your word count; it cares about its own internal chunk size.

Where it shows up

Tokens are everywhere in AI, but you’ll notice them in three places:

  • Pricing pages — OpenAI, Anthropic, and others list their costs as “$X per million input tokens” and “$Y per million output tokens.” Output tokens are usually more expensive because generating text takes more compute.
  • Context windows — When a tool says it has a “128K token context window,” that’s the model’s short-term memory. It can “see” about 128,000 tokens at once. For reference, a typical novel is around 100,000 tokens.
  • Error messages — If you hit a “token limit exceeded” error, you’ve stuffed more text than the model can handle at once. You’ll need to trim your input or use a model with a bigger window.

Common SMB use cases

Understanding tokens helps you make smarter choices about cost and performance. Here’s how it plays out for local businesses:

  • Dental practice in Winter Park — A dentist wants to use AI to draft patient follow-up emails. Each email is short (maybe 200 tokens), so cost is negligible. But if she uploads a 50-page insurance manual to ask questions about a specific clause, that’s 30,000+ tokens just for the input — and she’ll pay for it every time she asks a new question about the same document.
  • HVAC company in Maitland — They’re building a chatbot for their website that answers common service questions. Each customer query plus the system instructions might be 1,000 tokens. At 500 queries a month, that’s 500,000 tokens — about $0.10–$0.50 depending on the model. Totally manageable.
  • Law firm in downtown Orlando — A paralegal feeds a 200-page contract into an AI to summarize key terms. That’s 100,000+ tokens in one shot. If they’re using a model with a 128K window, it fits, but the output summary might be 5,000 tokens. That single task could cost a few dollars — not a big deal for a one-off, but a concern if they’re doing it daily.
  • Restaurant in Lake Nona — They want to use AI to generate weekly social media posts. Each post is maybe 300 tokens. Even 50 posts a month is only 15,000 tokens — pretty much free.

The pattern: short tasks are cheap. Long documents or repeated queries add up. Knowing your token usage helps you estimate costs before you commit.

Pitfalls (what gets oversold)

I hear a lot of hype around tokens that doesn’t match reality. Watch for these:

  • “We have a 1 million token context window!” — Sure, the model can technically see that much text. But in practice, it often “forgets” details in the middle of very long inputs. Bigger isn’t always better; it’s just more expensive.
  • “Tokens are the same as words.” — Nope. A token can be a word fragment, a punctuation mark, or a space. A 1,000-word email might be 1,300 tokens. Don’t assume a 1:1 ratio.
  • “You’ll save money with a bigger model.” — Larger models (like GPT-4) often cost 10–20x more per token than smaller ones (like GPT-3.5 or Claude Haiku). For simple tasks like drafting an email, the smaller model is fine. Don’t pay for a Ferrari to drive to the grocery store.
  • “Token limits are just a technical detail.” — They’re not. If you’re building a customer-facing chatbot and your system instructions alone take 5,000 tokens, you’ve got less room for the customer’s question. Plan your prompts carefully.

Related terms

  • Context window — The maximum number of tokens a model can process at once. Think of it as the model’s working memory.
  • Prompt engineering — Writing inputs that use tokens efficiently. Shorter, clearer prompts save money and get better results.
  • Inference — The process of the model generating output tokens from your input tokens. That’s where the cost lives.
  • Tokenization — The actual breaking-apart of text into tokens. Different models tokenize differently, so the same sentence might cost more or less depending on the model.

Want help with this in your business?

If you’re curious how many tokens your business processes each month — or want help picking a model that won’t blow your budget — just email me or fill out the lead form. I’ll walk you through it.