As Google Research and others unveiled TurboQuant to address AI bottlenecks by boosting AI model memory efficiency by up to sixfold, a KAIST professor also took part in the development, drawing attention.
KAIST said on Thursday that Han Insoo (한인수), a professor in its Department of Electrical Engineering, participated in TurboQuant, a next-generation quantisation algorithm unveiled by a joint team from Google Research, DeepMind and New York University to tackle memory overload, seen as a key limitation of AI models.
AI models operate by converting input data into vectors and then calculating similarity between vectors. Because they use high-precision data in this process, the need for vast memory resources has been cited as a major limitation.
TurboQuant uses quantisation technology that compresses and represents such high-precision data with fewer bits. It approximates decimal data as integers, sharply reducing storage and computing burdens while preserving core information.
According to KAIST, TurboQuant in this study succeeded in reducing memory use by up to six times by efficiently compressing information inside AI models with little loss of accuracy. A key outcome was that it effectively eased the memory bottleneck problem, considered the biggest obstacle in AI inference.
TurboQuant's core is a two-stage quantisation structure. In the first stage, it randomly rotates the input data and then quantises each element individually. This step reduces outliers in the data to improve compression efficiency. The approach was also used in the earlier PolarQuant research in which Han participated.
In the second stage, it quantises again the residual error generated in the first stage. The QJL (Quantized Johnson-Lindenstrauss) technique applied here is an ultra-lightweight 1-bit method that represents data using only {-1, 1} values. It can maximise computing efficiency while minimising information loss.
KAIST said it expected such technological advances to provide mid- to long-term momentum for the semiconductor memory market. In the short term, the reduced memory capacity required to run the same AI model could make demand growth look as though it is slowing, but KAIST described the technology as a catalyst for the popularisation of AI.
Han said the sharp rise in memory use as AI model performance increases has been cited as the biggest limitation. He said the research presented a new direction that can effectively reduce the bottleneck while maintaining accuracy. He added that it was expected to be used as a core foundational technology to run large-scale AI models more efficiently in the future.