It started as a routine infrastructure upgrade.

A carrier came out to replace the router at one of the plant locations as part of a planned refresh. Standard procedure. The kind of thing that happens dozens of times a year at manufacturing operations across the country.

The next day, the power went out. When it came back, the router didn’t.

What dead in the water actually looks like.

People talk about downtime in manufacturing like it’s an inconvenience. A slow morning. Some frustrated emails.

That’s not what it looks like from the inside.

When that circuit went down, the plant lost its connection to the ERP system at the home office. No ERP connection meant no order fulfillment. No order fulfillment meant no brick getting loaded. Trucks were coming. Orders existed. The brick was sitting in the yard. And none of it could move because the system that told the floor what to load was unreachable.

The PLCs running the kilns were onsite — they kept the fires burning, the manufacturing process continued. But the engineers who normally managed those systems remotely couldn’t see them from the office anymore. They had to get in their cars and drive to the plant just to do their jobs.

One downed router. Multiple locations affected. Engineers on the road. Loading operations stopped.

That’s what dead in the water actually looks like.

The carrier couldn’t fix it.

I went onsite and spent the better part of a day troubleshooting with the carrier. We worked through everything. Neither of us could figure out why the router wouldn’t come back up. A replacement was coming — but not for three days.

Three days in a brick manufacturing operation isn’t a number you absorb. It’s a number you feel in orders not fulfilled, trucks turned away, and a production schedule that doesn’t care about your circuit problem.

I wasn’t willing to wait three days.

The overnight fix.

Early the next morning I started working on an alternative. The pieces were unconventional but they were available: a cellular wireless connection, a software-based router I’d been working with for years, and a clear picture of the network architecture I needed to replicate.

The goal wasn’t to build something elegant. The goal was to build something that worked by the time the plant opened.

What got the plant back online wasn’t luck — it was the kind of instinct that only comes from years of solving problems that don’t have documented solutions. Knowing that the software router could do what I needed it to do. Knowing that mirroring the IP addressing scheme was the fastest path to transparent failover. Knowing, before I started, that the pieces I had available could be assembled into something that would work. That’s not something you find in a manual. It’s pattern recognition built from two decades of walking into broken environments and finding a way through.

I tunneled the cellular connection back to the home office through the software router and mirrored the IP addressing scheme from the downed circuit. The critical network segments — the ones the manufacturing equipment and ERP integration ran on — came back up as if the primary circuit had never gone down. The equipment didn’t know the difference. The engineers could see their systems again. The floor could load brick.

By the time the plant came online that morning, the workaround was running. The replacement router arrived two days later. We cut back over, decommissioned the workaround, and moved on.

Total additional downtime after the overnight fix: zero.

What we did after the crisis.

When the replacement router arrived and we cut back over to the primary circuit, I did something that doesn’t always happen after an emergency fix: I didn’t just close the ticket and move on.

We had gotten lucky on the timeline. But what we didn’t have was a system. And that was the real problem.

A single circuit. A single router. A single point of failure between the production floor and the systems that told it what to do. That vulnerability hadn’t changed just because we’d found a workaround. It was still there, waiting for the next power outage, the next failed hardware, the next carrier technician who couldn’t explain why the equipment wasn’t coming back up.

What we built instead was a system that could be deployed at any plant location in minutes — by whoever happened to be onsite, without needing me on the road.

Pre-configured machines. Mirrored network architecture. A plug-it-in solution that any staff member could execute from a simple set of instructions. If the primary circuit ever went down again, the failover wasn’t a phone call to me and a two-hour drive. It was a power cable and a couple of minutes.

We shipped a unit to each plant location. They sat on a shelf, ready. The next time a circuit failed — and in manufacturing, there is always a next time — the plant manager didn’t call me in a panic. They plugged in the box and called me to let me know it was working.

That’s the difference between solving a problem and solving a problem.

What this story is actually about.

It’s not about cellular modems or software-based routing tools or IP addressing schemes.

It’s about what happens in the aftermath of a crisis that most people skip — the part where you take the thing that just broke your operation and make sure it can never break it the same way again.

Every manufacturing operation has this vulnerability somewhere. A single circuit. A single router. A single point of failure between your production floor and the systems that tell it what to do. Most of the time it holds. When it doesn’t, the question isn’t just whether someone shows up to fix it — it’s whether they stick around long enough to make sure you’re never in that position again.

That’s the job.


If your operation is running on a single point of failure nobody’s had time to address, that conversation is worth having before the next outage. Schedule a conversation →