How We Deployed Vision Inspection for a Melbourne Machine Shop

*An anonymized case study: how we built a computer-vision quality-inspection assist for an aerospace machine shop in Melbourne — flagging surface defects on a line, the false-positive tuning that mattered, and why the human inspector stayed the final word.*

(Client details are anonymized and some specifics composited at the client’s request.)

I got a call from a machine shop in Melbourne that makes precision aerospace components. They had a quality problem: every day, a human inspector sat at a bench under bright lights, rotating parts and looking for tiny surface defects — scratches, dings, discolorations. The work was tedious, and fatigue set in after about 90 minutes. They estimated they were missing about 4% of defects, which meant rework downstream and occasional angry calls from customers. They’d tried hiring more inspectors, but the work was hard to staff. They’d tried better lighting and magnifiers. Nothing moved the needle.

They asked if AI could help. I told them it could — but not as a replacement. Here’s what we actually built.

The Situation: What Was Breaking

The shop ran three shifts, producing roughly 1,200 small aluminum and titanium parts per day. Each part needed a 30-second visual check under a microscope. The inspector would look for scratches deeper than 0.005 inches, pits larger than 0.002 inches, and any discoloration from heat treat. They had written standards, but interpretation varied between inspectors. The defect rate that made it to customers was about 1.5% — too high for aerospace. Internally, they were scrapping or reworking 6% of parts, costing roughly $4,500 per month in lost material and labor.

The biggest pain point? After 90 minutes of inspection, accuracy dropped. They measured a 12% increase in missed defects in the second hour of a shift. They’d tried rotating inspectors every hour, but that cut productivity. They needed a way to keep quality high without burning out they’re team.

What They Had Tried Before

They’d experimented with a commercial off-the-shelf vision system from a well-known vendor. It used traditional rule-based algorithms — thresholding, edge detection, blob analysis. The system worked for simple pass/fail checks like hole diameters, but it couldn’t handle the variability of surface defects. It flagged every scratch as a defect, even cosmetic ones that didn’t affect function. The false-positive rate was 35%, which meant the inspector had to re-check everything anyway. They abandoned it after three months.

They also tried hiring a second inspector per shift. That helped a little, but they couldn’t find enough qualified people. The work is boring, and Melbourne’s labor market is tight. They needed a different approach.

The AI Work We Did

I started with an AI readiness assessment to understand their data, infrastructure, and tolerance for change. They had a good collection of labeled images — about 8,000 photos of parts with known defects, plus 12,000 photos of good parts. The images were taken under consistent lighting, which made things easier. But the defects were subtle: a scratch might be only a few pixels wide.

We decided on a computer vision approach using a convolutional neural network (CNN) fine-tuned on their data. Specifically, we used a ResNet-50 backbone pretrained on ImageNet and fine-tuned on their defect classes. We trained it to output three categories: “accept,” “reject,” and “review.” The “review” category was for borderline cases — defects that might be cosmetic or might be real. We set the initial threshold conservatively, aiming for high recall (catching all real defects) even at the cost of more false positives.

We deployed the model on a small edge device — an NVIDIA Jetson Nano — mounted at the inspection station. The camera was a 12-megapixel industrial USB camera with a macro lens. We wrote a simple interface: the inspector places the part under the camera, presses a foot pedal, and the system captures an image, runs inference in under 0.5 seconds, and displays a green check, red X, or yellow triangle with a confidence score. The inspector then makes the final call.

Training took about 12 hours on a rented GPU instance. We used data augmentation — rotations, flips, brightness changes — to make the model robust to slight variations in lighting. We also used a technique called “focal loss” to handle the class imbalance (more good parts than defects).

The False-Positive Tuning That Mattered

The first version of the model had a false-positive rate of 22%. That was better than the old rule-based system, but still too high. Inspectors were ignoring the system because it cried wolf too often. We spent two weeks tuning.

First, we added a “review” category and adjusted the confidence thresholds. Instead of a binary accept/reject, we created a three-way decision. The model would flag clear defects as “reject,” clear good parts as “accept,” and everything else as “review.” The inspector then only had to spend time on the review pile, which was about 15% of parts. That cut the false-positive burden by half.

Second, we used a technique called “test-time augmentation”: we ran inference on five slightly different crops of the same image and averaged the results. This smoothed out noise and reduced false positives by another 3%.

Third, we implemented a feedback loop. When an inspector overrode the system (e.g., accepted a part the model flagged as reject), that image was saved and later used for retraining. We retrained the model weekly for the first month, and the false-positive rate dropped to 8%.

Here’s the thing: we didn’t try to eliminate false positives entirely. That would’ve driven false negatives up. Instead, we made the system transparent — the inspector could see the confidence score and the region of the image that triggered the alert. That built trust.

Why the Human Inspector Stayed the Final Word

From the start, I told the client that the AI would be an assist, not a replacement. Aerospace parts have zero tolerance for error. A missed defect could cause a failure in flight. The stakes are too high for a model to have the final say. Also, the model couldn’t judge certain things: a scratch that looks bad but is within spec, or a subtle discoloration that indicates a material problem but isn’t a defect per se. Those require context and experience.

We designed the workflow so that the inspector always makes the final decision. The system highlights suspicious areas and gives a recommendation, but the inspector can override it with one click. Over time, the inspector learned to trust the system for clear-cut cases and focus their attention on the ambiguous ones. That’s where their expertise added the most value.

We also kept a human in the loop for model updates. Every week, I reviewed the override logs with the quality manager. We looked for patterns: was the model missing certain defect types? Was it flagging a new kind of scratch that was actually acceptable? That feedback drove the retraining.

One thing that was harder than expected: getting the inspectors to trust the system. The first week, they ignored it. We had to run side-by-side comparisons — the inspector’s judgment versus the model’s — and show that the model caught defects they’d missed. Once they saw that, adoption picked up.

Measured Results

After three months, here’s what we measured:

Defect escape rate (defects that reached customers) dropped from 1.5% to 0.3% — an 80% reduction.
Internal scrap and rework costs fell from $4,500/month to $1,200/month.
Inspection throughput increased by 25% because inspectors could process parts faster on clear cases.
Inspector fatigue decreased. They reported less eye strain and could work longer without breaks.
The false-positive rate stabilized at 8%, and the review pile was about 15% of parts.

But look, there are honest caveats. The model still struggles with certain defect types — very fine scratches that’re barely visible, or defects that’re hidden in shadows. We’re adding more training data for those cases. Also, the system required ongoing maintenance: weekly retraining, occasional camera recalibration, and software updates. That’s not free. The client now has a part-time AI technician (one of their existing engineers) to handle it.

Another caveat: this system works for their specific parts and lighting. It wouldn’t transfer directly to a different shop without retraining. That’s why we always start with an assessment to understand the unique context.

What We’d Do Differently

If I could do it over, I’d spend more time on the user interface. The first version was clunky — a command-line tool that required the inspector to type a part number. We later replaced it with a simple touchscreen app that scans a barcode. That saved about 10 seconds per part, which added up to 3 hours per shift.

I’d also involve the inspectors earlier in the design process. We showed them a prototype after two weeks, but they had suggestions that would’ve been easy to incorporate from day one — like a bigger display and a foot pedal instead of a button. Small things matter.

Finally, I’d plan for the “review” pile from the start. We added it after the first false-positive complaints. It should’ve been in the initial design.

Closing

This project proved that AI can help quality inspection in manufacturing — but only when it’s designed as a tool for humans, not a replacement. The machine shop in Melbourne is now running the system on two shifts and planning to expand to a third. They’re also looking at using the same approach for incoming raw material inspection. The key was starting with a clear problem, tuning relentlessly, and respecting the expertise of the people on the line.

If you’re in Central Florida and wondering whether AI could help your operation, reach out. We can start with a conversation and see if there’s a fit.

"The key insight: we didn't try to eliminate false positives entirely. Instead, we made the system transparent."

Frequently asked questions

What computer vision model did you use?

We used a ResNet-50 CNN pretrained on ImageNet and fine-tuned on the client's defect images. We trained it with focal loss to handle class imbalance.

How did you reduce false positives?

We used a three-way decision (accept/reject/review), test-time augmentation, and a weekly retraining loop with inspector feedback. The false-positive rate dropped from 22% to 8%.

Why did you keep a human inspector?

Aerospace parts have zero tolerance for error. The AI can't judge context or subtle variations that require experience. The inspector makes the final decision.

How much did the system cost to deploy?

Costs included the edge device ($500), camera ($300), and about 40 hours of development time. Ongoing costs are minimal: weekly retraining and occasional maintenance.

Can this system work for other types of parts?

It can, but it requires retraining on new defect images. The approach is transferable, but the model is specific to the parts and lighting it was trained on.

What was the hardest part of the project?

Getting the inspectors to trust the system. We had to run side-by-side comparisons and show that the model caught defects they missed before they adopted it.

Ready to talk it through?

Send a one-line description of what you are trying to do. I will reply within one business day with a plain-English next step. Email or use the form →