AI Glossary
Overfitting is when an AI model gets so good at its training examples that it fails to handle new, real-world situations — like a student who aces the practice test but bombs the final because they memorized answers instead of learning the material.
What it really means
Let me explain this with a story. A few months ago, I was helping a Winter Park dental practice build a model to predict which patients were likely to cancel appointments. They gave me two years of patient data — appointment times, day of week, insurance type, age, everything. I trained a model, and it hit 98% accuracy on the test data. The dentist was thrilled.
I wasn’t. That number was too good to be true.
When I ran the model on the next month’s actual appointments, its accuracy dropped to 65%. What happened? The model had learned patterns that were specific to that two-year window — like “patients named Susan who book on Tuesdays in March always cancel” — instead of general patterns like “patients who book more than three weeks out are more likely to cancel.”
That’s overfitting. The model memorized the noise and quirks of the training data instead of the underlying signal. It’s like a chef who can perfectly recreate one specific dish from a single restaurant but can’t cook anything else — they’ve memorized the recipe, not learned how to cook.
Where it shows up
Overfitting happens anytime you train a model on a dataset that’s too small, too narrow, or too noisy. I see it most often in three places:
- Small datasets. A Sanford auto shop had only 50 service records when they tried to build a predictive maintenance model. The model learned which specific cars had problems, not which general conditions predict failures.
- Too many features. A Lake Nona restaurant gave me a dataset with 200 columns — including the weather on each day, the phase of the moon, and the number of Instagram posts about their brunch. The model found spurious correlations that didn’t hold up next month.
- Training too long. Think of it like studying for a test. If you study the same practice problems for 100 hours, you’ll know those problems cold — but you won’t know how to solve new ones. AI models do the same thing when you train them too many times on the same data.
Common SMB use cases
Overfitting isn’t something you want to do — it’s something you need to avoid. Here’s where I help Central Florida business owners watch for it:
- Customer churn prediction. A Maitland HVAC company wanted to know which customers might switch to a competitor. If the model overfits, it might flag “customers who called on a Tuesday in July” as high-risk, when that’s just random noise.
- Inventory forecasting. A Clermont pool service tried to predict chemical supply needs. An overfit model might learn that “customers named Bob order more chlorine” — a meaningless pattern that won’t help next season.
- Lead scoring. A downtown Orlando law firm wanted to prioritize potential clients. An overfit model could decide that “people who email at 2:17 PM on Thursdays” are better leads, when that’s just a quirk of the training data.
- Fraud detection. Any model that flags suspicious transactions can overfit to specific dates, amounts, or merchant names that aren’t actually fraudulent — just unusual in the training set.
In every case, the fix is the same: test the model on data it has never seen before. If performance drops sharply, you’ve got overfitting.
Pitfalls (what gets oversold)
Here’s what I hear from business owners who’ve been burned:
- “Our model is 99% accurate!” I’ve never seen a real-world model hit 99% accuracy on new data. If someone tells you that, ask to see results on data the model wasn’t trained on. Chances are they’re overfitting.
- “More data is always better.” Not if the data is noisy or irrelevant. A model trained on 10,000 messy records can overfit worse than one trained on 500 clean, relevant records.
- “We can just add more features to improve accuracy.” This is the most common trap. Every extra feature gives the model another chance to memorize noise. I’ve seen models that included the day of the week, the weather, and the stock market — and performed worse than a simple model with just two or three meaningful inputs.
- “The model works great on our historical data.” That’s the bare minimum. The real test is whether it works on tomorrow’s data. I always tell clients: “Show me what happens when you run it on next month’s numbers, not last year’s.”
The overselling happens when vendors or tools brag about training accuracy without showing you real-world validation. If it sounds too good to be true, it’s probably overfit.
Related terms
- Underfitting. The opposite problem — the model is too simple and misses the patterns entirely. It’s like studying for five minutes and failing both the practice test and the real one.
- Bias-variance tradeoff. The balancing act between underfitting (high bias) and overfitting (high variance). You want a model that’s complex enough to learn patterns but simple enough to generalize.
- Cross-validation. A technique where you split your data into multiple chunks, train on most of them, and test on the leftover chunk. It’s the best way to catch overfitting before you deploy a model.
- Regularization. A set of methods that penalize the model for being too complex. Think of it as a speed bump that forces the model to focus on the most important patterns.
- Generalization. The goal of all machine learning — a model that performs well on new, unseen data. Overfitting is the enemy of generalization.
Want help with this in your business?
If you’re wondering whether your current AI tools are overfitting or just underdelivering, I’m happy to take a quick look — just email me or use the contact form on this site.