Next-generation high-bandwidth memory HBM4 led by Samsung Electronics and SK Hynix. [Photo: Shutterstock]

Nvidia GPUs have accounted for a large share of AI infrastructure costs, but memory has recently emerged as an important factor as well.

As hyperscalers expand data centres, DRAM chip prices have surged about sevenfold from a year earlier. Against that backdrop, memory orchestration is becoming an important element by optimising memory so the right data is delivered to AI agents at the right time, TechCrunch reported on Feb. 17 local time.

A company that excels at memory orchestration can process the same query with fewer tokens, which can affect profit and loss, it said.

Semiconductor analyst Dan O'Laughlin highlighted the importance of memory chips in an interview with Val Bercovici, chief AI officer at Weka, on the newsletter platform Substack.

Bercovici in particular pointed to Anthropic's prompt caching documentation becoming increasingly complex. "If you look at Anthropic's prompt caching pricing page, 6 to 7 months ago it was a simple guide saying 'caching makes it cheaper'," he said. "Now it's encyclopaedia-level, even covering how much cache write you need to buy in advance."

The key is how long Claude keeps prompts in cache memory. Users can buy cache in 5-minute or 1-hour units, and good cache management can cut costs, but when new data is added, existing data can be pushed out.

For startups, this can be an opportunity. Startups such as TensorMesh, which specialises in cache optimisation, are cited as among those drawing attention in this area.

How data centres use DRAM and HBM is also becoming an important task, such as deciding when to use DRAM instead of HBM and how to configure a model swarm to use a shared cache.

If companies improve memory orchestration, token use falls, inference costs drop and lower server costs make it more likely AI applications can secure profitability, TechCrunch reported.

Keyword

#Nvidia #DRAM #HBM4 #Anthropic #TensorMesh
Copyright © DigitalToday. All rights reserved. Unauthorized reproduction and redistribution are prohibited.