Half of AI’s most expensive chips are sitting idle, says tech expert
The artificial intelligence boom, so far, has been defined by a single instinct: build more. More chips, more data centres, more power. Capital expenditure from the world’s largest tech firms is running into the hundreds of billions as they race to scale AI systems.
Analysts expect global data centre investment to continue rising sharply, with major cloud providers alone guiding tens of billions in annual capital expenditure tied directly to AI infrastructure.
New projects like hyperscale campuses, or Elon Musk’s proposed ‘Terafab’ chip complex in Texas, prove that demand for compute will keep outstripping supply.
But Jürgen Hatheier, Ciena’s vice president of business development, told City AM: “Even today, where those data centers are highly optimized, you still have 50 per cent of the time that those GPUs spend on doing something just waiting for data to arrive”.
A single high-end GPU can run between $20,000 and $30,000 and consume around a kilowatt of power. If it spends half its time idle, the inefficiency becomes an expensive problem due to the lag in infrastructure.
Across large AI clusters, often made up of tens of thousands of such chips, that idle time quickly scales into a material drag on returns.
Training large language models
“The innovation on the compute side has happened… at a factor of three faster than what we have innovated on the connectivity side”, he said.
Indeed, it seems like AI has become very good at thinking very quickly, but hans’t quite got there when it comes to sharing information between systems at that same speed.
That mismatch becomes increasingly visible, not to mention dangerous, as workloads scale. Training large models requires vast amounts of data to move between clusters of machines, often across multiple data centres.
In some enterprise use cases, datasets can run into tens of petabytes, meaning even high-capacity networks can take days or weeks to move information if not properly optimised.
Inference, the day-to-day running of AI tools, adds another layer of unpredictable, continuous demand. And, if the network cannot keep up, the machines pause.
“If you have… a machine that can’t communicate with another machine fast enough, you are wasting energy… you have GPUs that are sitting there, you have power that is being burned for nothing,” Hatheier said.
The issue is compounded by the rise of agentic AI systems, where multiple AI agents coordinate tasks in parallel.
In these environments, chips are constantly exchanging data. Industry research suggests that in such workloads, delays in data handling and orchestration, often managed by CPUs and networks, can account for the majority of system latency, leaving expensive compute resources underutilised.
The speed bottleneck
For companies building AI infrastructure, this shift means that simply owning the best chips in the market, is no longer enough. What matters now is just how quickly they can be fed with data, and how efficiently they can work together.
That priority has already begun influencing how networks are being built, with fibre routes and pre-deployed infrastructure becoming real competitive advantages in their own right.
Operators are increasingly investing ahead of demand, laying capacity early in anticipation of future AI workloads.
“Any investment we are making now… it will be consumed,” Hatheier said. “In the arms race of AI, it’s all about velocity, how quickly you really get this connected.”
The market is starting to reflect that, with shares in networking companies like Ciena surging.
The company’s stock rising sharply over the past year as investors position for a multi-year buildout in data infrastructure. Yet some analysts caution that valuations already assume sustained demand from hyperscalers, leaving little room for execution missteps.
At the same time, the strain is not limited to networking, and the broader supply chain is under increasing pressure.
In the US, electricity infrastructure is struggling to keep pace with data centre demand, with equipment shortages, labour constraints and rising costs delaying new capacity.
Meta, Microsoft and Alphabet dominate
Semiconductor production faces similar bottlenecks, with advanced manufacturing capacity still falling short of projected AI demand.
And, while companies like Meta, Microsoft and Alphabet continue to dominate spending, enterprises are increasingly experimenting with their own AI infrastructure for reasons of data control or sovereignty. That includes deploying on-premise GPU clusters, often at significant cost.
“There are tons, thousands of enterprises just buying a million dollar rack of GPUs and like ‘yeah, I want this in house and I can do whatever I want with my data'”, Hatheier said.
That broadens the pressure on networks. More users, more data, more movement, and not always in predictable patterns. While training demand has now relatively well understood, inference remains a point of uncertainty.
“On the inference side nobody knows… we don’t know what the next application’s going to be,” he added.
In parallel, the growing role of CPUs in coordinating these AI systems is emerging as another constraint, alongside connectivity.
New workloads are driving demand for server processors to handle data movement and task execution, with some estimates suggesting millions of CPUs will be required to support next-generation AI deployments.
Supply has struggled to keep pace, with chipmakers already warning of shortages and extended lead times.
Taken together, the picture is pointing to a system under strain. And while compute may be the headline story, without the infrastructure to move data efficiently, and indeed the supporting hardware to manage it, much of that investment risks sitting idle.