The analysis is meaningful in that it recalculated AI power consumption by reflecting real-service GPU utilisation and batch processing rather than a simple average. [Photo: Shutterstock]

[DigitalToday reporter Jinju Hong (홍진주)] Microsoft (MS) released an analysis showing that the electricity consumed by sending one question to a large language model (LLM) is much lower than widely known levels. The company said that in a real service environment, power use per AI query could be up to 20 times lower than previous estimates.

On June 16 (local time), online media outlet Gigazine reported that MS recently published an analysis of the power efficiency of AI inference on its cloud blog.

MS said that assuming a typical response length of about 300 tokens, the median power consumption was about 0.31 Wh. The middle 50 percent of the distribution was 0.16 to 0.60 Wh. That is similar to the electricity needed to run a 1,000-watt microwave for about 0.6 to 2 seconds.

The analysis comes as concerns grow that data centre power consumption is surging with the spread of generative AI. When users ask Copilot or chatbots to summarise emails, organise meetings or write code, LLM inference tasks are carried out in data centres. As questions and answers get longer, the number of tokens to process rises and power use increases as well.

MS said earlier power estimates did not adequately reflect real operating conditions. It said batch processing, which handles many requests at once, and GPU utilisation in large-scale service environments were not sufficiently considered in earlier calculations. It added that results can also differ significantly depending on whether calculations include only GPUs or also include CPUs and cooling facilities.

Researchers carried out the analysis assuming a large model with more than 200 billion parameters running on a server equipped with 8 Nvidia H100 GPUs. They built a scenario similar to a real service environment by reflecting token processing speed, server power consumption and the data centre power usage effectiveness (PUE).

It also released estimates for water use. MS said that the amount of water used during cooling per typical query was 0 to 0.067 mL. Based on the median, it was less than one-hundredth of a teaspoon.

It stressed that not all AI requests use the same level of resources. When response length rises to about 5,000 tokens, such as for long code generation or multi-step reasoning, the median power consumption rose to 3.91 Wh. That is about 13 times higher than a typical question. MS said that as longer responses sharply increase power use, the energy efficiency of future AI services could be influenced more by processing methods and response scale than by the number of questions.

The gap grows when scaled to overall service volume. Processing 1 billion typical questions in a day was estimated to require about 0.7 gigawatt-hours of electricity. The company said that applying additional optimisation could lower it to about 0.3 gigawatt-hours. If only 10 percent of total requests shift to long reasoning tasks, power demand rises to about 1.7 gigawatt-hours, and even after optimisation about 0.8 gigawatt-hours is needed, the analysis showed.

MS said it expects significant room for efficiency improvements. It said wider use of smaller models, infrastructure optimisation and the introduction of next-generation GPUs and its own AI chips could raise energy efficiency per query by 8 to 20 times.

The release strongly reflects a direct response to debate over the power burden of AI services. Industry discussions have widely cited estimates that several watt-hours are needed per AI question, along with claims that a single ChatGPT query uses about 10 times more electricity than a Google search.

MS said that reflecting GPU utilisation and batch processing in real service environments suggests those estimates could have been 4 to 20 times higher than actual. It also said that power use rises quickly for long responses and complex reasoning, shifting the focus of AI power discussions from the number of questions to operating methods and response length.

Keyword

#Microsoft #ChatGPT #Google #Nvidia H100 #Copilot
Copyright © DigitalToday. All rights reserved. Unauthorized reproduction and redistribution are prohibited.