<i>An anonymized case study of a Lake Nona physical therapy clinic that used Whisper transcription and a custom LLM pipeline to turn spoken notes into structured SOAP notes, saving 40 minutes per clinician per day while keeping HIPAA front and center.</i>
(Client details are anonymized and some specifics composited at the client’s request.)
I walked into the clinic just off Lake Nona’s Medical City around 8:30 on a Tuesday. The waiting room was already half full. Behind the front desk, a physical therapist was scribbling furiously on a notepad between patients. She looked up and said, “I spend more time writing notes than I do treating people.” That line stuck with me.
Her clinic—a four-clinician physical therapy practice serving mostly ortho and post-surgical patients—was drowning in documentation. Each therapist was spending roughly 90 minutes a day on SOAP notes (Subjective, Objective, Assessment, Plan). That’s 6 hours of clinician time per day, across the team. At their billing rate, that was about $4,500 a month in lost revenue opportunity. And that didn’t count the burnout.
They’d tried voice dictation before. Dragon Medical One worked okay, but it still required the therapist to say punctuation and manually structure the note. They’d also tried a templated dropdown system in their EHR, but it felt rigid and missed the nuance of each patient’s story. Nothing stuck.
So they called me. Their ask was simple: “Can you make it so I talk, and the note writes itself?” The answer was yes—but with some honest caveats.
The Situation: What Was Breaking
The clinic’s documentation workflow looked like this: therapist treats patient, jots down a few keywords on paper, then between patients or at end of day, types a full SOAP note into their EHR. That 90 minutes per day was fragmented—10 minutes here, 15 there. It meant notes were often completed hours after the visit, risking accuracy. And it ate into time they could’ve spent with family, or frankly, just decompressing.
More critically, the backlog of unfinished notes was growing. At any given time, each therapist had 5-7 notes from the previous day still pending. That’s a compliance risk and a billing delay. The clinic owner told me, “We’re leaving money on the table because notes aren’t done.”
They needed a system that was fast, accurate, and HIPAA-compliant. They also needed it to work with their existing EHR, which had a web-based API for importing notes.
What They’d Tried Before
Before calling me, the clinic had attempted several solutions. The first was a medical scribe service—a remote person who’d listen to recordings and type notes. That cost $15 per hour, per clinician, and introduced latency and privacy concerns. They dropped it after three months.
Then they tried a consumer-grade voice-to-text app (like Otter.ai) and manually copying text into their EHR. That saved maybe 20 minutes a day, but the output was messy—full of filler words, hallucinations, and no structure. Therapists still had to edit heavily.
They also looked at a few AI scribe startups, but the monthly per-clinician fees were steep (around $200 per provider) and the integrations were half-baked. They wanted something tailored to their specific workflows.
The AI Work: What We Built
We designed a pipeline that takes a therapist’s spoken narrative and outputs a structured SOAP note. Here’s the technical breakdown, in plain English.
Step 1: Secure Recording. The therapist uses a HIPAA-compliant mobile app (we used a custom wrapper around the device’s microphone, with local encryption) to record their dictation immediately after each patient. The audio never touches a public cloud without encryption. We used AES-256 at rest and TLS 1.3 in transit.
Step 2: Whisper Transcription. The encrypted audio file gets sent to a dedicated server running OpenAI’s Whisper model (large-v2). We chose Whisper because it’s open-source, accurate on medical terminology, and can be self-hosted. The server is in a private subnet with no internet access except through a secure API gateway. We fine-tuned Whisper on a small set of de-identified PT notes to improve accuracy on terms like “patellofemoral” and “eccentric loading.”
Step 3: LLM Structuring. The raw transcript is then passed to a local LLM (we used a fine-tuned Llama 3 8B model) that converts it into a structured SOAP note. We used a technique called retrieval-augmented generation (RAG) to pull standard templates and common diagnoses from a vector database. The prompt instructs the LLM to output JSON with fields: subjective, objective, assessment, plan. We also added a “confidence score” for each field.
Step 4: Human-in-the-Loop Review. The note is presented in a simple web dashboard where the therapist can review and edit before finalizing. We deliberately kept a human in the loop because SOAP notes are legal documents. The therapist can accept, edit, or reject any field. This step takes about 2-3 minutes per note, down from 15.
Step 5: EHR Integration. Once approved, the note gets pushed to their EHR via API. We built a custom connector using n8n, an open-source workflow automation tool. The connector maps the JSON fields to the EHR’s required format and handles error logging.
One thing that was harder than expected: Whisper struggled with background noise. The clinic had open treatment bays with other therapists talking, machines beeping, and patients chatting. We had to add noise suppression preprocessing (using a small neural network) before feeding audio to Whisper. That improved accuracy from about 82% to 94% word error rate.
Where We Kept a Human in the Loop
I already mentioned the review step. But we also kept a human for two other reasons. First, the LLM occasionally “hallucinated” a finding that wasn’t in the dictation—for example, adding “patient reports pain with stairs” when the therapist only mentioned “stairs” in passing. The review step catches that. Second, the clinic’s therapists wanted to maintain their clinical judgment. They didn’t want a black box writing notes without oversight.
We also built a feedback loop: when a therapist edits a note, that edit is logged and periodically used to fine-tune the LLM further. Over three months, the edit rate dropped from 30% of notes to about 12%.
The Measured Results
After a two-week pilot, we rolled out to all four clinicians. Here’s what we measured over 30 days:
- Time savings: Average time per note dropped from 15 minutes to 4 minutes (including review). That’s 40 minutes saved per clinician per day, or 2.7 hours per day across the team.
- Backlog eliminated: Within the first week, the pending notes backlog went from 20+ to zero. Therapists were completing notes before the end of their shift.
- Accuracy: We sampled 50 notes and compared them to a manual gold standard (a senior PT reviewed each). The pipeline’s output had a 92% accuracy rate on clinical facts. After therapist edits, that rose to 99%.
- Clinician satisfaction: In a survey, three of four therapists said they’d “strongly recommend” the system. The fourth said it was “good but needs better handling of multi-patient dictation” (fair point—we’re working on speaker diarization).
“I used to dread the end-of-day note session. Now I dictate while the patient is putting their shoes on, and the note is ready by the time I walk to the desk.” — Anonymous therapist at the clinic
What We’d Do Differently
No project is perfect. Here are honest caveats and lessons learned.
1. Self-hosting is not for everyone. We ran the pipeline on a dedicated server in the clinic’s office. That required IT support for updates, backups, and monitoring. For a smaller practice, a cloud-hosted solution with a Business Associate Agreement (BAA) might be better. We’re now offering a managed version through our fractional AI officer service.
2. Whisper’s accuracy varies by accent and speed. One therapist spoke very quickly with a southern drawl. His error rate was about 8% higher than others. We mitigated this by adding accent-specific fine-tuning data, but it took extra weeks.
3. The review dashboard needs to be mobile-friendly. Therapists wanted to review notes on their phones between patients. Our initial web dashboard was desktop-only. We rebuilt it with a responsive design in week three.
4. LLM cost and latency. Running a local LLM is cheaper per inference than cloud APIs, but the upfront hardware cost was around $3,000 for a GPU. For a clinic with 10+ clinicians, cloud might be more scalable. We’re evaluating Groq for faster inference.
5. Regulatory gray areas. We consulted with a healthcare attorney about using AI for clinical documentation. The consensus: as long as a human reviews and takes responsibility, it’s defensible. But the landscape is evolving. We recommend all clients take our AI readiness assessment to evaluate compliance risks.
Is This Right for Your Practice?
If you’re a PT clinic, or any healthcare practice that does structured documentation, this approach can work. The key ingredients are: willing clinicians, a receptive EHR, and a tolerance for some technical setup. The ROI is clear: 40 minutes per clinician per day translates to roughly $2,000 per month in reclaimed time for a four-person clinic.
But it’s not a magic bullet. If your clinicians aren’t tech-comfortable, or if your EHR doesn’t have an API, you’ll need a different path. We’ve helped other practices with simpler solutions like Microsoft 365 Copilot for summarizing emails and scheduling. And for those just starting out, our AI glossary can help demystify the terms.
If you want to explore this for your practice, reach out. We can start with a free 30-minute consultation to see if your workflow is a fit.
"I used to dread the end-of-day note session. Now I dictate while the patient is putting their shoes on, and the note is ready by the time I walk to the desk."
Frequently asked questions
How does the pipeline ensure HIPAA compliance?
All audio and text data is encrypted at rest and in transit. The Whisper and LLM models run on a local server in the clinic's office, not in the public cloud. We sign a Business Associate Agreement with any third-party vendors (like the EHR API). The human review step ensures no PHI is exposed without oversight.
What if the therapist has a strong accent or speaks quickly?
Whisper handles most accents well, but we saw higher error rates for very fast speakers. We mitigated this by fine-tuning on a small sample of the therapist's voice. For extreme cases, we recommend speaking at a moderate pace and using a good microphone.
Can this work with any EHR?
It depends on whether your EHR has a documented API for importing notes. We built custom connectors for two popular PT EHRs (WebPT and Clinicient). If your EHR doesn't have an API, we can explore alternative approaches like screen scraping or manual import.
How much does the system cost?
The upfront hardware cost was about $3,000 for a GPU server. Ongoing costs are minimal (electricity, maintenance). We charge a setup fee and a monthly retainer for support and fine-tuning updates. For clinics that prefer cloud hosting, we offer a per-clinician subscription starting at $150/month.
What if the LLM makes a mistake?
The human review step is designed to catch errors. Therapists can edit any field before the note is submitted. We also log all edits to continuously improve the model. In our pilot, the edit rate dropped from 30% to 12% over three months.
Is this only for physical therapy clinics?
No. The same pipeline can be adapted for any healthcare specialty that uses structured notes—occupational therapy, speech therapy, chiropractic, or even veterinary medicine. The key is having a consistent note structure and willing clinicians.
Ready to talk it through?
Send a one-line description of what you are trying to do. I will reply within one business day with a plain-English next step. Email or use the form →