How We Built AI Case Search for a Winter Park Law Firm

*A boutique firm's old paper-and-folder workflow became searchable, cited answers in weeks—not months.*

(Client details are anonymized and some specifics composited at the client’s request.)

I sat across the desk from three paralegals in a Winter Park law office last spring, watching one of them dig through a shared drive for a specific ruling from a case that’d closed in 2015. She opened folder after folder. Twenty minutes later, she found it—or thought she had. Another paralegal said, “Wait, I think the precedent is actually in the Gonzalez file from 2017.” They opened another folder. I’d seen this before: smart people doing important work, drowning in filing cabinets that happen to live on a server.

That’s when the managing partner asked the question I hear often: “Can AI help us find things faster?” The honest answer is: yes, but not the way most people think. It’s not magic search. It’s not ChatGPT pointed at your files. It’s a specific technique called retrieval-augmented generation—RAG—built on top of vector embeddings, and it requires real architecture decisions, access controls, and honesty about what it can and cannot do.

This is how we built it, what broke along the way, and what the paralegals actually got back: roughly 9 hours every week.

The Situation: 12 Years of Case Law Living in the Dark

The firm had handled roughly 400 closed cases over twelve years. Each case file was a mix: court documents, discovery, correspondence, internal memos, billing notes, and research. All of it was technically digitized—scanned PDFs, Word documents, spreadsheets—but scattered across a shared drive with a folder structure that’d made sense in 2012 and had fossilized since.

When a paralegal needed to find a precedent, a similar fact pattern, or even just a judge’s ruling style, they did what all humans do: they’d search their memory, ask a colleague, or open folders one by one. If someone remembered that “Judge Harrison handled something similar,” that was actually useful. But if the memory was fuzzy? You’re looking at hours.

The firm had tried a document management system once, about eight years prior. It cost money, required training, and nobody really used it because the old folder structure was just easier. They’d also tried a basic full-text search tool, but it returned thousands of results for a query like “damages” and sorting through them was almost as slow as folder-diving.

The bottleneck wasn’t technology. It was context. A paralegal needed to search semantically—meaning, find cases that meant something similar, not just cases that used the same keywords. And they needed to trust the results enough to cite them in court documents.

Why Simple Search Failed

Before we built anything, I asked the paralegals to describe a typical search. One example: “I’m researching contract interpretation in a case where the client signed something ambiguous. I need cases where Florida courts ruled on intent versus plain language.” A keyword search for “contract” and “interpretation” returns hundreds of results. Most are noise. The paralegal then has to open documents, skim them, and mentally rank them by relevance. That’s 45 minutes of work.

Look, full-text search also doesn’t understand relationships. It can’t answer, “Show me cases where the defendant was a contractor AND there was a dispute about payment AND the judge sided with the plaintiff.” Keywords alone just can’t do that.

The other issue: confidence. The paralegals needed to cite sources precisely. If an AI system said, “This case supports your argument,” they had to be able to see exactly which document and which paragraph that came from. Vague answers were worthless—actually worse than worthless, because they could lead to bad citations.

The Technical Build: Vector Embeddings and Retrieval-Augmented Generation

Here’s what we built, in plain terms.

First, we extracted all the case files—about 12,000 documents ranging from a few pages to 200+ pages. We chunked them into reasonable segments, usually 500–800 words per chunk, with overlap at boundaries to preserve context. We didn’t throw out metadata; every chunk kept its source file, date, case name, and attorney.

Next, we converted each chunk into a vector embedding—a mathematical representation of its meaning, created by a language model. The key insight: documents with similar meanings get similar vectors. A chunk about breach of contract and damages will sit closer to another chunk about contract violation and injury than to a chunk about trademark infringement. We used OpenAI’s embedding model (text-embedding-3-small) because it’s fast, well-supported, and accurate enough for legal work.

We stored these vectors in a vector database (Pinecone, managed and simple). When a paralegal searches—”cases about contractor disputes and payment delays”—we convert their query to a vector and ask the database: “Show me the 20 chunks with vectors closest to this one.” Instant semantic search.

Then comes the RAG part. We take those 20 top chunks and feed them, along with the user’s question, to a large language model—GPT-4 in this case. The LLM reads the retrieved chunks and synthesizes an answer, citing which documents it used. The answer is grounded in the actual case files. The paralegal sees both the answer and the sources.

We built this as a web app: Python backend with FastAPI, React frontend, deployed on AWS. Paralegals log in, type a question in plain English, get results in 2–3 seconds with full citations.

One thing I’ll admit was harder than I expected: handling PDF scans. Not all documents were born digital. Some were scans of old paper files. We had to OCR these—we used Tesseract, open-source, worked fine—but OCR isn’t perfect. Handwritten notes, faded copies, and weird fonts caused real headaches. We had to validate the OCR output and, in a few cases, manually re-process certain documents. That added two weeks to the project timeline.

The Human Guardrails We Built In

The single biggest design decision: we kept the paralegals in the loop. The system doesn’t write answers that go directly into court documents. It generates a draft citing sources. A human always reviews it.

Here’s why. Language models hallucinate. They’re good, but they can invent citations that sound plausible but don’t exist, or they can misinterpret a ruling. A lawyer or paralegal catching that mistake before it hits a brief? That’s the whole game. We made it dead obvious what the sources are: you can click any citation and open the original document right there.

We also built strict access controls. Not every user can search every case. We tied access to the firm’s matter management system—a paralegal can only search cases and documents they’re already assigned to. This was legally important and also reduced noise.

We set up logging. Every search is logged: who searched, what they asked, what they got. If there’s ever a question about a citation or a decision, the audit trail is there.

And we didn’t try to automate the cite-checking. We just gave the paralegals a better starting point. They still verify everything. That’s not a limitation of the system; it’s a feature.

What the Paralegals Actually Use It For

Three months in, the system was handling about 40–50 searches per week across the three-person team. The questions broke down like this:

Precedent hunting (“Has our firm handled a case with similar facts?”). Used to take 1–2 hours; now takes 10 minutes because the paralegal gets a ranked list of semantically similar cases.
Judge research (“What’s Judge Harrison’s ruling pattern on X?”). Instead of scrolling through case summaries, paralegals search for cases she’s decided that involve the relevant legal issue. About 15–20 minutes saved per search.
Discovery strategy (“What damages have been awarded in similar cases?”). The system surfaces comparable cases with damages amounts cited. This one saves maybe 30–45 minutes, because the alternative is manual review of historical damages worksheets.
Opposing counsel intel (“Has this firm argued this position before in our case law?”). Less common, but occasionally paralegals search for patterns in how opposing counsel tends to frame arguments in past cases the firm’s handled.

“The system doesn’t write briefs for us. It finds the cases we’d spend half a day hunting for. That’s worth everything.”

The Numbers That Mattered

The firm’s three paralegals collectively logged about 12 hours per week in what they called “file hunting” before we built this. That’s not billable; it’s overhead. After the system went live, they estimated they were spending 3 hours per week on searches—a savings of about 9 hours weekly. Conservative estimate, they said.

That 9 hours translates to about $5,400 per month at loaded paralegal rates (roughly $60/hour burdened). Annual savings: about $65,000, and the system cost $22,000 to build and $800/month to operate. Payback period: less than five months.

But the real value was tougher to quantify: faster case preparation, fewer missed precedents, higher confidence in citations. One of the attorneys told me, “I sleep better knowing they have sources right there.” That’s not a number, but it matters.

What We’d Do Differently—and Honest Caveats

The system works well, but it’s not perfect. Here’s what I’d change:

Better OCR upfront. We should’ve human-verified the OCR output before vectorizing. It’d have cost a bit more but saved time later. Instead, we caught bad OCR during user testing.

Finer chunking strategy. We discovered that our 500–800-word chunks were sometimes too long. A 700-word chunk on three different legal topics meant the vector was averaged across all three, making it less precise. We now use 300–400-word chunks with more overlap, which improved precision.

More user training than we did. The first week, paralegals were phrasing questions too specifically or too vaguely. A 15-minute chat about how to write a search query would’ve saved confusion. We did that in month two; we should’ve done it day one.

What it doesn’t do: It doesn’t predict case outcomes. It doesn’t replace legal research. It’s not a general-purpose AI; it’s a search engine that understands meaning. If you ask it something completely outside the case files, it’ll tell you it doesn’t have that information. That’s by design.

How to Know If Your Firm Needs This

You’re a candidate for RAG case search if you’ve got:

More than 100 closed cases or documents
Paralegals spending more than 5 hours per week hunting files
A shared drive or document system that’s more than three years old
A willingness to let AI be a starting point, not an answer

If you’ve got all four, it’s worth a conversation. If you think AI will write your briefs for you, it won’t—and that’s okay. But if you think AI could give your team back their time, it can.

Building this kind of system starts with an AI readiness assessment—essentially, an audit of your documents, workflows, and pain points. That conversation helps us scope what’s realistic and what kind of return you should expect. From there, we spec the build, test with real data, and iterate. If you’re thinking about this for your own firm or business, let’s talk about your situation.

The system doesn't write briefs for us. It finds the cases we'd spend half a day hunting for. That's worth everything.

Frequently asked questions

What's the difference between this and just using ChatGPT on my case files?

ChatGPT isn't trained on your files, and uploading documents to ChatGPT raises confidentiality concerns. This system uses your files locally, embeds them into a searchable database, and always cites specific sources. You also control access—only authorized staff see authorized cases. Plus, the search is semantic, not keyword-based, so you get more relevant results.

How long does it take to build something like this?

For a firm with 200–500 cases and clean digital files, typically 6–10 weeks from kickoff to live system. If you have a lot of scans that need OCR, add 2–3 weeks. The biggest variable is document quality and organization. We also account for testing, training, and refinement.

What if we change a case file after it's been indexed?

The system can re-index. If you add new case files or update existing ones, we can schedule regular re-indexing (weekly, monthly, whatever makes sense). It's not a one-time thing; the index can stay current.

Can it handle different document types—PDFs, Word docs, emails?

Yes. We extract text from PDFs, Word, and other common formats. Emails are trickier because they're often scattered across inboxes, but if they're saved as files, they work. The limiting factor is usually document organization, not format.

Who pays for the embedding and language model API calls?

That's an ongoing cost we include in the system budget. For the firm in this case study, it's about $200–300/month for embeddings and LLM queries. A bigger firm with more searches might spend $500–1,000/month. It's small relative to the time savings.

How do we know the AI isn't making up citations?

The paralegals verify every result. The system shows sources clearly, and clicking a citation opens the original document. We also built in logging so you can audit what the system said and where it looked. The human is always the final arbiter of accuracy.

Ready to talk it through?

Send a one-line description of what you are trying to do. I will reply within one business day with a plain-English next step. Email or use the form →