Power demand surges from a single AI prompt put post-transformer architectures in focus

The race to improve AI performance cannot be solved simply by expanding data centres. [Photo: Reve AI]

A surge in power demand driven by the spread of artificial intelligence is exposing the limits of transformer-based large language models, increasing the need for next-generation architectures to replace them. A view is spreading that expanding data centres alone will not be enough to meet rising compute demand. The industry is focusing on post-transformer structures as a solution to improve power efficiency.

Tech news outlet TechRadar reported on April 9 that the core problem is that the way AI models improve performance still depends on more compute, more layers and more data. Bain & Company forecast annual spending related to data centres will reach $500 billion in 2030. The Stargate initiative involving SoftBank, OpenAI and Oracle is also seen as a move to respond to such demand growth. Grid operators, in contrast, are warning that AI demand could pressure supply and act as a factor of instability in energy markets.

Inference-focused models that have spread recently are adding to the power burden. One study found a long prompt for GPT-4o consumes about 0.42 watt-hours, while DeepSeek-R1 uses more than 33 watt-hours and GPT-4.5 about 30 watt-hours. That means a single prompt could require more power than charging a smartphone.

The transformer architecture is cited as a reason behind this cost structure. As data scale grows, computation increases exponentially and continues to require high-speed memory and processing of large numbers of parameters. With inference functions added, token usage has also surged. Early models responded with hundreds of tokens, but recent models use thousands of inference tokens to generate their reasoning step by step.

Some point out that scaling up is no longer as efficient for performance as it used to be. In the industry, an awareness is spreading that transformer-based models are nearing limits on performance improvement at a certain level.

An alternative presented in this situation is a structure that mimics the human brain. The brain operates a network made up of fewer than 100 billion neurons and hundreds of trillions of synapses on about 20 watts of power, and the Brain-inspired Dragon Hatchling, or BDH, architecture is a representative example. Instead of using all parameters at the same time, this structure selectively activates only artificial neurons related to a task. It also reflects synaptic plasticity, in which connection strength changes during use, to improve learning efficiency.

The structure focuses on reducing inference costs and token usage by using only what is needed rather than repeatedly running the entire model. It is also characterised by the ability to accumulate intelligence during actual use without regular large-scale retraining. From a corporate perspective, it is seen as lowering cost burdens, and AI companies view it as a structure that can improve performance and energy efficiency at the same time.

Another condition is compatibility with existing infrastructure. That is because the likelihood of adoption rises if it can be applied without a wholesale replacement of infrastructure.

Ultimately, AI's power problem is expanding beyond the environment into an issue of economic feasibility. The industry sees that if a new architecture emerges that can sharply reduce inference costs, the very standard of AI competition could change. The key question going forward is whether post-transformer structures can prove performance, cost and compatibility at the same time in real-world environments.

Jinju Hong hongjj@d-today.co.kr

Keyword