The pilot-to-plant gap

Why industrial AI dies between the demo that works and the plant that runs. A named gap, its mechanism, and a test a board can apply.

A blueprint-style diagram of a layered industrial framework.

The demo works. The pilot works. Then nothing happens for three years, and often nothing happens at all.

This is the pattern that catches boards off guard in industrial AI, and in most industrial technology. The question they ask is whether the thing works. By the time they are asking it, the answer is almost always yes. The hard part is the plant that runs the technology at scale, on the balance sheet, inside the safety case. That is where the value is won or lost. The gap between demo and plant is the subject.

Name it plainly. Every industrial technology climbs a five-rung ladder: demo, pilot, first-of-a-kind, nth-of-a-kind, fleet. A demo shows the physics. A pilot shows it on one line, watched by the people who built it. A first-of-a-kind is the first unit that has to pay for itself with no special handling. Nth-of-a-kind is the tenth, where the cost curve starts to bend. Fleet is the technology running as normal infrastructure. Most hype dies between the pilot and the first-of-a-kind. That crossing is the gap.

The mechanism

Three costs arrive at the same rung, and they arrive together.

The first is unit economics at scale. A pilot is subsidised by attention. The vendor sends its best engineers, the site clears its best line, and nobody counts the hours. At scale the subsidy ends. The technology has to clear a return on capital against the incumbent process, which is depreciated, understood, and already paid for. A model that lifts throughput four percent in a pilot has to survive integration, downtime, and the cost of the people who now run it differently. The number that mattered in the pilot is rarely the number that decides the plant.

The second is brownfield integration. Almost no industrial deployment lands on a clean site. It lands on a plant built in 1994, with control systems from three vendors and a maintenance schedule set by a union agreement. The process knowledge that makes it run is tacit, held by four people who are near retirement. The last part of the deployment consumes most of the value. The pilot flatters because it runs on the clean twenty percent. The plant disappoints because the messy eighty percent is where the work is.

The third is the workforce and the safety case. An industrial change is not live until an operator will stand next to it, a regulator will sign it, and an insurer will price it. Each of those asks the same question the demo did not have to answer: what happens when it fails, and who is standing there when it does. Answering it means logged failure modes, retraining, and a paper trail that takes quarters. This work is the deployment, and treating it as friction to be removed is the error that opens the gap.

Procurement sits underneath all three. Industrial buyers run on annual capital cycles, multi-year approval chains, and reference-customer requirements. A technology can be ready and still wait eighteen months for a budget line to open. The AI vendor prices in quarters. The plant buys in years. The two clocks do not meet at the pilot, and the gap between them is measured in working capital the vendor does not have.

What the evidence shows

The pattern is now measurable in the data. MIT’s Project NANDA study in 2025 found that ninety-five percent of organisations deploying generative AI saw no measurable return. Enterprise AI spending that year ran to over six hundred billion dollars. Most of those pilots worked. They cleared the demo. They failed to cross into the plant, and the reasons cluster around integration, missing success metrics, and the absence of anyone who owned the crossing.

The grid tells the same story in physical form. Berkeley Lab’s 2025 interconnection study looked at all the generation capacity that entered the US connection queue between 2000 and 2020. Only about thirteen percent had reached commercial operation by the end of 2025. The projects were not fake. The technology worked. Roughly three-quarters were withdrawn in the crossing, defeated by cost, siting, and a wait now measured at over five years from request to operation. The demo of a solar farm is a signed lease. The plant is a connected asset. The distance between them is the gap, priced in gigawatts.

The test a board can apply

Do not ask whether the technology works. Ask which rung it is on, and what the next rung costs.

A board can run this in one meeting. For any technology being sold to it, place it on the five-rung ladder using evidence alone. The vendor’s deck stays closed. A single supervised pilot is rung two, whatever the pitch says. Then ask the only question that matters at that rung: what does the first unprotected unit cost to build and run, with no special engineering attention, on our actual sites. If the team cannot answer, the technology has not crossed the gap and the board is being shown a demo dressed as a deployment.

Two follow-ups sharpen it. Who on our side owns the messy eighty percent, the integration, the retraining, the safety case, and is that person funded. And which clock governs the timeline, the vendor’s or our capital cycle. A technology that is real on the ladder, owned on the integration, and honest on the clock is worth capital. One that scores well only on the demo is worth a second pilot, and no more.

The gap is the reason to price industrial AI correctly, and no reason to avoid it. The winners in the next decade are the firms that build the crossing and own the eighty percent. They hold their nerve through the safety case while the hype cycle moves on. The firms with the best demos are a different set.

This is the lens the diagnostic applies before it recommends a single pound of capital.