AI infrastructure company Dinotisia said on Wednesday it has released a paper and source code on KV cache compression technology. The technology uses a temporary memory space that stores on GPUs context that large language models (LLMs) have previously read so they do not recalculate it.
The company said the disclosed STAR-KV technology is the result of joint research involving UC San Diego’s VVIP Lab and Dinotisia researchers. It was selected as a Spotlight paper at ICML 2026 (International Conference on Machine Learning 2026), a machine learning conference.
Based on the paper’s experiments, STAR-KV reduced KV cache by up to 75 percent using low-rank compression alone. Combined with the mixed-precision quantization technique proposed in the paper, it compressed the overall KV cache by up to 20 times.
KV cache compression has emerged as a key technical task in the AI infrastructure industry. As research intensifies to reduce memory bottlenecks around long-context AI, including attention to TurboQuant disclosed by Google researchers, STAR-KV presents a new approach that combines low-rank compression with quantization and GPU execution optimization, the company said.
ICML, where the STAR-KV paper was selected, is regarded as a leading international conference in AI and machine learning along with NeurIPS and ICLR. ICML 2026 will be held at Seoul’s COEX from July 6 to 11.
Dinotisia plans to further advance the technology so STAR-KV can be used in real AI service environments. It also plans to enable its use in open-source LLM inference frameworks such as vLLM.
Dinotisia CEO Moo-kyung Jung (정무경) said, "Technologies are advancing to allow AI to process longer context faster at lower cost." He said, "STAR-KV is a technology that substantially solves the core bottleneck problems of KV cache capacity and attention processing speed, and Dinotisia will contribute to the AI inference ecosystem through open-sourcing."