<i>A 200,000-SKU catalog, counter staff who knew parts by feel, and a search that finally understood “the thingy that goes on the pump.” Here’s how we built it.</i>
(Client details are anonymized and some specifics composited at the client’s request.)
I walked into a dusty back office off Orange Blossom Trail. The owner, a third-generation distributor, pointed at a stack of paper catalogs. “My best counter guy retires next month,” he said. “He knows every part by smell. The new hires can’t find a hydraulic fitting if you gave them the part number.”
They had a 200,000-SKU catalog of industrial parts — valves, seals, bearings, fittings, hoses — spread across an ancient ERP and a bunch of spreadsheets. Customers called in with descriptions like “the brass thingy that goes on the compressor” or “that black rubber seal for a 2-inch pipe.” Counter staff would spend 5–10 minutes per call, hunting through categories. Missed calls piled up. The owner estimated they were losing 60 calls a day to hold times and frustration.
They’d tried a few things before calling me. A junior developer built a keyword search that matched exact terms. It failed on synonyms: “seal” didn’t find “gasket,” “O-ring,” or “packing.” They tried a third-party product database, but it didn’t cover their niche inventory. Nothing worked.
I suggested we build a semantic search using embeddings and a vector database. Here’s the idea: instead of matching keywords, we convert every part description and customer query into a numerical vector — a list of numbers that captures meaning. Then we find the closest vectors in the catalog. That way, “thingy that goes on the pump” can match “impeller seal kit” if the vectors are close enough.
The Situation: What Was Breaking
The distributor had three locations in Central Florida: a main warehouse in Orlando, a smaller branch in Sanford, and a counter in Winter Park. Each location had it’s own way of naming parts. The ERP had 200,000 SKUs, but descriptions were a mess — some were one word, some were paragraphs, some were just “see paper catalog.”
Counter staff spent an average of 8 minutes per call just finding the right part. With 200+ calls a day, that’s over 26 hours of search time daily. Customers hung up after 2 minutes on hold. The owner estimated $4,500 a month in lost sales from abandoned calls alone.
The biggest pain point? New hires took months to learn the “tribal knowledge” — the nicknames and synonyms that seasoned staff used. “That red handle thing” meant a specific toggle valve. “The little plastic clip” was a retaining ring for a pneumatic cylinder. None of it was documented. Honestly, this happens everywhere, and it kills efficiency.
What They Tried Before (And Why It Failed)
They tried a standard SQL full-text search. It returned results only if the exact word appeared. “O-ring” didn’t find “seal.” “Hydraulic” didn’t find “hydro.” They tried a product taxonomy from a trade association, but it didn’t cover their custom inventory — they stocked parts for irrigation pumps, conveyor belts, and food-processing equipment that weren’t in any standard database.
They even hired a temp to manually tag every SKU with synonyms. After two weeks and 5,000 SKUs, they gave up. The catalog was too large, and the synonyms kept changing as new parts arrived. That’s when they called me.
The AI Work We Did: Building Semantic Search
We started with a readiness assessment to understand their data quality, infrastructure, and goals. Then we built a pipeline in three phases.
Phase 1: Data Cleanup and Embedding Generation
We extracted all 200,000 SKUs from their ERP. Each SKU had a part number, a short description (average 12 words), a category, and sometimes a long description from the manufacturer. We cleaned the text: removed special characters, standardized casing, and concatenated the short and long descriptions into a single text field.
Then we used a pre-trained sentence transformer model (all-MiniLM-L6-v2) to convert each description into a 384-dimensional vector. The model’s small enough to run on a single GPU and it understands semantic relationships — it knows “seal” and “gasket” are similar. We stored all vectors in a vector database (Qdrant, running on a $40/month cloud VM).
Phase 2: Building the Synonym Layer
This was the hardest part. The staff used hundreds of undocumented synonyms. “C-clip,” “retaining ring,” “snap ring,” “circlip” — all the same thing. “Buna-N,” “nitrile,” “NBR” — same material. Look, you can’t just use a generic thesaurus for this stuff. These were industry-specific terms that only made sense if you’d been in the business for years.
We built a semi-automated synonym extraction process. First, we scraped the ERP’s transaction history — every time a counter person looked up a part and then sold it, we recorded the search term and the part selected. That gave us 50,000 query-result pairs. We then used a simple word2vec model trained on those pairs to find words that appeared in similar contexts. For example, “O-ring” and “seal” appeared in the same queries and results 80% of the time. We manually reviewed the top 500 candidate synonyms and approved 420 of them. Those went into a synonym dictionary that the search system uses to expand queries before embedding.
Phase 3: The Search Interface
We built a simple web app that counter staff type a query into. The app first expands the query using the synonym dictionary. Then it embeds the expanded query with the same model. Finally, it queries the vector database for the 20 nearest neighbors (by cosine similarity) and returns the results ranked by similarity score.
We deliberately kept a human in the loop: the system shows the top 20 results, but the counter person still picks the right one. No automatic ordering. The staff can also mark a result as “wrong” to improve future searches. That feedback gets logged and used to fine-tune the synonym dictionary monthly.
One thing that was harder than expected? Handling multi-word queries that were actually a single part name. “Water pump seal” could mean a seal for a water pump, or it could be a specific part called “Water Pump Seal” (model number WPS-200). We solved this by adding a simple rule: if the exact phrase exists in the catalog as a part name, boost it to the top of results.
Where We Kept a Human in the Loop
We didn’t automate everything. The counter staff are still the experts. The AI is a tool to get them to the right part faster. They review the top 20, confirm the match, and handle the sale. If the system is unsure (similarity score below 0.7), it shows fewer results and asks the staff to rephrase. We also kept the manual synonym review process — every month, we export new query logs, run the word2vec model again, and review new candidate synonyms. The owner or a senior staff member approves them. No exceptions.
The Measured Results
After three months of use, here’s what we measured:
- Average search time dropped from 8 minutes to 45 seconds. Counter staff found the right part in under a minute on 90% of calls.
- Missed calls dropped by 70%. The owner reported that hold times went from 2+ minutes to under 30 seconds. They estimated recovering $3,000/month in lost sales.
- New hire ramp time shortened from 6 months to 6 weeks. New staff could find parts as well as veterans after a month of using the tool.
- Returns for wrong parts dropped by 40%. Fewer mistakes meant less restocking and fewer angry customers.
“The AI doesn’t replace my guys. It makes them twice as fast. My new hires sound like they’ve been here ten years.” — Owner, Orlando industrial distributor
What We’d Do Differently / Honest Caveats
If we were to do this again, we’d start with a smaller pilot. We spent two months on data cleanup because the ERP was a mess. A fractional AI officer could’ve helped them standardize data before we started. Also, the vector database costs $40/month, but the synonym extraction pipeline requires some manual effort — about 4 hours a month. That’s fine for a 200,000-SKU catalog, but for larger catalogs, you’d want more automation.
One caveat: the system works great for English descriptions. They had a few Spanish-speaking customers, and we didn’t handle that well. We’d add multilingual embeddings if we did it again.
Finally, we underestimated how much staff would resist at first. They thought the AI would replace them. We spent a week training them, showing them the tool was just a faster lookup. Once they saw it worked, they adopted it quickly. Change management matters more than you’d think.
Conclusion: Semantic Search Is Practical, Not Magic
This project wasn’t about buzzwords. It was about solving a concrete problem: counter staff spending 8 minutes per call hunting for parts. We used embeddings, a vector database, and a synonym dictionary built from their own data. The result? Faster searches, fewer missed calls, and happier customers.
If you’re a distributor — or any business with a large catalog and vague search terms — semantic search might help. Start with a free readiness assessment to see if your data is ready. Or contact us to talk about your specific situation.
“The AI doesn’t replace my guys. It makes them twice as fast. My new hires sound like they’ve been here ten years.”
Frequently asked questions
What is semantic search?
Semantic search uses AI to understand the meaning behind a query, not just the exact words. It converts text into vectors (lists of numbers) and finds the closest matches in a database. That way, 'thingy on the pump' can match 'impeller seal kit' even though they share no keywords.
How much does a semantic search system cost?
For a 200,000-SKU catalog, our cloud costs were about $40/month for the vector database. The main investment is the initial setup (data cleanup, embedding generation, synonym extraction) which took us about 6 weeks of part-time work.
Do I need to retrain my staff?
Yes, but it's minimal. We trained the counter staff in one day. The interface is a simple search box. The hardest part is getting them to trust the results — but after a week, most prefer it over the old system.
Can this work for other types of businesses?
Absolutely. Any business with a large product catalog and vague customer descriptions can benefit. We've also done similar projects for e-commerce, medical parts, and auto parts. The approach is the same.
How do you handle synonyms that aren't in the dictionary?
We have a monthly review process. We analyze search logs for new query terms that didn't match well, then use a word2vec model to suggest new synonyms. A human approves or rejects them. This keeps the system improving over time.
What if my data is messy?
Data cleanup is the first step. We can help with that, but it's often the most time-consuming part. A <a href="/ai-readiness-assessment/">readiness assessment</a> can identify the biggest issues before we start building.
Ready to talk it through?
Send a one-line description of what you are trying to do. I will reply within one business day with a plain-English next step. Email or use the form →