<i>An anonymized case study: how we combined CRM history and website clickstream data to predict which leads would close, why we ran fairness checks, and how it changed the broker’s daily call list.</i>
(Client details are anonymized and some specifics composited at the client’s request.)
I’ll never forget sitting down with the owner of a mortgage brokerage in Altamonte Springs. He had a stack of unfiled loan applications on his desk, a CRM that hadn’t been cleaned in three years, and a sales team that spent most of its time arguing over who got the “good” leads. He looked at me and said: “I know we’re leaving money on the table. I just don’t know where.”
That’s the kind of problem I love to dig into. It’s not about flashy tech. It’s about fixing a broken process with the right data and a little AI.
The Situation: A Lead Overflow with No Priority System
This brokerage had been in business for over a decade. They generated leads through Zillow referrals, Google Ads, direct mail, and past-client referrals. They were pulling in about 150 new leads per month on average. But their close rate had tanked—down to around 8% from 14% three years earlier.
The real problem? No one could agree on which leads to call first. Some loan officers chased every new lead immediately. Others cherry-picked the ones that looked easiest. High-value leads sat untouched for days while low-quality leads got immediate attention. They were burning time and money.
I started by pulling six months of CRM data: lead source, time of day, loan type, credit score range, property value, and—most importantly—whether the lead actually converted to a closed loan. Then I layered in website behavior from their analytics tool: pages visited, time on site, number of return visits, whether they used the mortgage calculator.
What They Had Tried Before (and Why It Didn’t Work)
They weren’t strangers to lead scoring. They’d already tried a simple points-based system: +10 for a high credit score, +5 for a referral source, -5 for a Zillow lead. It was arbitrary. No one really knew if those points predicted a closed loan. The loan officers ignored it because it often gave high scores to leads that never even answered the phone.
They’d also burned money on a third-party lead-scoring service built on demographic data. It was pricey and didn’t account for local behavior patterns. A lead from Winter Park might look fantastic on paper but never respond to emails. A lead from Apopka might have a lower credit score but way higher intent to actually close. The generic model missed those nuances entirely.
That’s when they called me.
The AI Work: Building a Custom Lead-Scoring Model
We decided to build a machine learning model from scratch using their own data. Pretty straightforward goal: predict the probability that a lead would close within 90 days.
Data Collection and Cleaning
First step was exporting all CRM records from the past 18 months—roughly 2,700 leads. I removed duplicates, standardized the fields, and merged in website behavior from Google Analytics using the lead’s email as a key. About 60% of leads had matching website data. The rest, we worked with CRM-only features.
Feature Engineering
I built around 30 candidate features, including:
- Lead source (categorical, one-hot encoded)
- Time to first contact (hours)
- Number of website visits before submission
- Mortgage calculator usage (yes/no)
- Pages per session
- Credit score range (binned)
- Loan-to-value ratio
- Property type (single-family, condo, etc.)
- Day of week and hour of submission
- Past client referral flag
Model Selection and Training
I went with a gradient-boosted tree model (XGBoost) because it handles mixed data types well and gives you feature importance scores. Split the data 80/20 for training and testing. The target was binary: “closed loan within 90 days” or not.
The model hit an AUC of 0.82 on the test set. Not perfect, but good enough to meaningfully prioritize leads. The top five most important features ended up being:
- Number of website visits before submission
- Mortgage calculator usage
- Lead source (referral vs. paid)
- Time to first contact (shorter = better)
- Credit score range
Fairness Checks
Look, I ran two fairness checks because this matters. First, I looked for bias by zip code. Altamonte Springs and the surrounding areas have diverse neighborhoods. The model didn’t show significant differences in predicted scores across zip codes after controlling for income proxies. Second, I checked for racial bias using name-based ethnicity estimation (a common proxy). The model slightly over-scored leads with names typically associated with white borrowers. We mitigated this by adding a fairness constraint during training—essentially penalizing the model for large disparities in false positive rates across groups. The final model had a balanced accuracy within 2% across groups.
Human-in-the-Loop
We kept a human in the loop for the first month deliberately. The model output a score from 0 to 100, but loan officers could override it if they had personal knowledge (say, a lead from a past client who was just browsing). After 30 days, we compared model-suggested call order to actual outcomes and found the model was right 85% of the time. We then automated the call list fully, though loan officers could still manually promote a lead if they had a strong reason.
Results: Measured Outcomes
After three months, here’s what changed:
- Close rate jumped from 8% to 12% (a 50% improvement)
- Average time to first contact dropped from 4 hours to 45 minutes
- Loan officers spent 10 fewer hours per week just sorting leads
- Revenue per lead increased by 22%
One loan officer told me: “I used to call 20 leads a day and maybe close one. Now I call 10 and close two. I actually have time to follow up properly.” That’s real change.
The owner estimated the model saved the company about $4,500 per month in wasted sales effort.
“The model doesn’t replace our judgment—it makes sure we use our judgment on the right leads.” — Loan officer, Altamonte Springs brokerage
What We’d Do Differently (Honest Caveats)
Honestly, this wasn’t a perfect project. A few things I’d change:
- More data on lead behavior after first contact. We only had pre-submission behavior. Adding email open rates, call pickup rates, and follow-up engagement would’ve improved the model considerably.
- Better tracking of offline conversions. Some leads came in via phone call and never touched the website. We missed those signals entirely.
- More frequent retraining. The market changes. We retrained quarterly, but monthly might be better during rate fluctuations.
- Including loan officer performance. Some officers closed leads at higher rates regardless of lead quality. Adding that as a feature could’ve helped us assign leads to the right person.
The fairness check was also eye-opening. Even with a relatively small dataset, bias can creep in. I now recommend fairness audits as standard on any scoring model.
How This Fits Into a Broader AI Strategy
This wasn’t a one-off project. It was part of a larger effort to make the brokerage more efficient. After the scoring model worked out, we implemented an AI voice agent to handle initial qualification calls (which freed up even more time). You can read about that in our AI voice agent implementation guide. We also helped them adopt Microsoft 365 Copilot to automate follow-up emails and appointment scheduling (see our rollout process).
If you’re thinking about a similar project, I’d recommend starting with an AI readiness assessment to see where your data actually stands. Not every business has clean enough data to build a model right away. But for this Altamonte Springs broker, it was the difference between drowning in leads and actually closing loans.
Want to talk through your own lead-scoring challenges? Get in touch.
“The model doesn’t replace our judgment—it makes sure we use our judgment on the right leads.” — Loan officer, Altamonte Springs brokerage
Frequently asked questions
How long did it take to build the lead-scoring model?
About six weeks from data collection to deployment. The first two weeks were spent cleaning and merging data. Model training and testing took another week. We then ran a one-month pilot with human oversight before fully automating.
What data do you need to build a similar model?
At minimum, you need historical lead records with a conversion outcome (closed or not), lead source, and some behavioral data like website visits or call logs. The more data you have, the better the model will perform.
How do you ensure the model is fair?
We run bias checks by zip code, name-based ethnicity estimation, and other proxies. We also use fairness constraints during training to penalize large disparities in false positive rates across groups. Regular audits are recommended.
Can this work for a smaller business with fewer leads?
Yes, but the model may be less accurate. With fewer than 500 leads, I’d recommend a simpler rule-based system or a hybrid approach. We can help you decide during an AI readiness assessment.
What if a lead doesn’t have website behavior data?
We handle that by using CRM-only features for those leads. The model still works, but the score may be less precise. We recommend capturing website behavior for all leads if possible.
Do you still need a human in the loop?
We recommend keeping a human override option, especially for leads from past clients or personal referrals. The model is a prioritization tool, not a decision maker.
Ready to talk it through?
Send a one-line description of what you are trying to do. I will reply within one business day with a plain-English next step. Email or use the form →