SKT elite team releases A.X K1 technical report, cites math and coding performance

SKT’s Euljiro office building. [Photo: SK Telecom]

SK Telecom’s elite team participating in the government’s “independent AI foundation model” project released a technical report on Jan. 7 for its “A.X K1” ultra-large AI model with 519 billion parameters on the open-source platform Hugging Face.

The SK Telecom team completed A.X K1, South Korea’s first ultra-large model with more than 500 billion parameters, with a design that maximised various technologies and efficiency. It built the 519-billion-parameter model despite a short development period of about four months and limited GPU resources.

A.X K1 achieved similar or higher performance on major benchmarks than widely used ultra-large models globally, including DeepSeek-V3.1. Typically, as the number of parameters increases, optimisation time and GPU resource投入 also increase. The team stressed it secured strong performance even though the model is at least twice as large as those built by other elite teams.

A.X K1 can further improve performance by adding more computing resources and data depending on an additional research period. SKT plans to add multimodal functions within the year and expand it to trillion-parameter levels.

The team trained A.X K1 using 1,000 GPUs. It estimated the total possible training volume based on the training period and GPU scale. It then designed the maximum model size based on scaling theory. It set a target of a 519-billion-parameter model with a globally distinctive parameter structure and trained it using about 10 trillion data items.

It used more than 1,000 GPUs on an ongoing basis for AI training. To maximise effectiveness relative to the GPU resources投入, it mathematically designed and managed the optimal amount of training computation. SKT said it achieved the target using only its own GPU procurement without government support.

A.X K1 delivered strong performance in areas such as math and coding. Benchmark metrics described in the report allow performance-to-scale comparisons with DeepSeek-V3.1 with 685 billion parameters and the open-source model GLM-4.6 with 357 billion parameters.

In math, it scored 89.8 on the AIME25 benchmark, or 102 percent of the DeepSeek-V3.1 model’s 88.4. LiveCodeBench, measured for coding usability, scored 75.8 for English and 73.1 for Korean, demonstrating real-time coding problem-solving capability. Compared with DeepSeek-V3.1’s 69.5 for English and 66.2 for Korean, it showed higher performance at 109 percent and 110 percent, respectively..

A.X K1 also improved efficiency by selectively activating only 33 billion of its 519 billion parameters. It adopted a mixture-of-experts (MoE) architecture to secure both stability and efficiency in AI training. MoE is a method in which multiple small expert models come together to solve a single large problem.

A.X K1 can also handle long contexts of 128,000 tokens at a time. That is about 100,000 words in Korean, enabling the AI model to review an entire novel or a full corporate annual report at the same time.

Jinho Lee jhlee26@d-today.co.kr

Keyword