Context Window

AI Glossary

Think of the context window as how much information an AI can keep in its short-term memory while answering a single question.

What it really means

When you talk to an AI, it doesn’t remember anything from one conversation to the next unless you tell it to. But even within a single conversation, there’s a limit to how much it can hold in mind at once. That limit is the context window.

I like to describe it as a whiteboard. You can write on it, erase parts, and add new notes. But eventually, the board fills up. When that happens, the AI has to start forgetting older information to make room for new stuff. The bigger the whiteboard — the larger the context window — the more it can keep track of in one go.

Context windows are measured in tokens, which are roughly pieces of words. A token might be a short word like “the,” or a chunk of a longer word. For most practical purposes, think of 1 token as about 0.75 words in English. So a context window of 4,000 tokens can hold roughly 3,000 words. Some newer models have windows of 100,000 tokens or more — enough to handle an entire novel in one request.

Where it shows up

You’ll hear about context windows when you’re choosing an AI model or setting up a tool. For example, if you’re using ChatGPT, Claude, or a custom model for your business, the context window size is a key spec. It determines how much you can paste into a prompt, how long a conversation can go before the AI starts forgetting, and how much data you can feed it for analysis.

It also comes up in any tool that uses AI to process documents. If you upload a 50-page contract and ask questions about it, the AI needs a context window big enough to hold the whole document. If the window is too small, it will only “read” part of the contract before answering — and that’s a recipe for mistakes.

Common SMB use cases

I’ve seen context windows matter most in a few everyday situations for Central Florida businesses:

  • A Winter Park dental practice uses an AI assistant to answer patient questions about procedures. When a patient asks about a root canal, the AI needs to hold the full conversation history — plus the practice’s policies and pricing — in its context window. A small window means it might forget the patient already said they’re anxious about needles.
  • An HVAC company in Maitland uploads a year’s worth of service records to ask: “Which neighborhoods have the most compressor failures?” If the context window is too small, the AI can only look at a few months of data and might miss seasonal patterns.
  • A law firm in downtown Orlando uses AI to review discovery documents. With a large context window, they can paste an entire deposition transcript and ask specific questions about contradictions. With a small one, they have to break it into chunks and risk losing the thread.
  • A pool service in Clermont has an AI chatbot on their website. When a customer asks about pricing, then scheduling, then chemical treatments, the AI needs to remember the earlier parts of the conversation. A small context window means it might forget the customer’s address halfway through.

Pitfalls (what gets oversold)

Bigger context windows sound better, and they often are — but there are catches.

First, bigger isn’t always faster. A model with a 100,000-token window can take noticeably longer to process a request, especially if you’re feeding it a lot of text. For quick back-and-forth conversations, a smaller window can actually feel snappier.

Second, the AI doesn’t use the whole window equally well. Research shows that models tend to pay more attention to information at the beginning and end of the context window. Important details buried in the middle can get lost, even if they’re technically “in memory.” So don’t assume that a huge window means perfect recall — you still need to structure your prompts carefully.

Third, some vendors hype context window size as a magic bullet. I’ve seen pitches like “Our AI remembers everything!” That’s misleading. The context window is about a single request, not long-term memory. If you want the AI to remember things across days or weeks, you need a separate system — like saving conversation summaries or using a database. The context window alone won’t do that.

Finally, cost. Larger context windows consume more computing power, which often means higher per-request costs. For a small business, paying for a 100,000-token window when you only need 4,000 is like buying a dump truck to move a couch.

Related terms

  • Token: The basic unit of text that models process. A token is roughly a word fragment. Understanding tokens helps you estimate how much text fits in a context window.
  • Prompt engineering: The practice of writing inputs to get the best results from an AI. Good prompt engineering often involves working within — or around — the context window’s limits.
  • Retrieval-augmented generation (RAG): A technique where the AI pulls relevant information from a database or document store, rather than trying to fit everything into the context window. This is how many business tools handle large amounts of data without needing a massive window.
  • Fine-tuning: Training a model on your specific data. This doesn’t change the context window size, but it can make the model more efficient with the window it has.

Want help with this in your business?

If you’re trying to figure out what context window size makes sense for your business — or whether a tool you’re considering is overselling it — I’m happy to talk it through. Just email me or use the contact form.