As the government-led “homegrown AI foundation model” project enters a second evaluation phase, industry interest is rising over the direction of model development. After whether teams built “from scratch” emerged as a key variable in the first evaluation, expectations are growing that multimodal capabilities will be the watershed in the second round.
The 3 elite teams that passed the first evaluation, including SK Telecom, LG AI Research and Upstage, are preparing to build multimodal AI foundation models, the industry said on Jan. 28. The move is seen as a strategy to seek differentiation in a direct performance contest beyond building from scratch, which involves carrying out the entire development process in-house.
SKT put forward its 519B-class large language model (LLM) “AX. K1,” with more than 500 billion parameters, in the first evaluation. AX. K1 is a model that showed strengths in advanced mathematics and coding. SKT plans to enhance AX. K1’s multimodal functions so it can also process voice and video data. The company aims to boost the model’s usability to secure competitiveness in the second evaluation as well.
An SKT official said, “The SKT elite team plans to apply multimodal functions sequentially starting with image data from the second-stage evaluation.” The official added, “To advance performance, we plan to expand the scale of training data and increase the number of training languages to 5.”
LG AI Research is also known to be continuing research and development with the ultimate goal of building a multimodal model. Upstage has also previously outlined plans to secure multimodal functions through a first-round public presentation.
Multimodal capabilities also tie to the feasibility of delivering public services sought by the homegrown AI foundation project. That is because various use scenarios are possible, including handling civil complaints using voice, analyzing video data, and integrated processing of documents and images. An industry official said, “It is now less about whether you can build your own model and more about how much differentiated performance and usability you can show.” The official added, “Moves to appeal the possibility of applying it to public services through multimodal will likely become more prominent.”
If the 3 elite teams that have already advanced to the second evaluation enter a multimodal contest, a key point to watch will be how companies challenging the additional selection can follow this trend. The Ministry of Science and ICT decided in the first evaluation to drop 2 teams and select 1 additional team, unlike its original plan.
KT, Kakao and Naver have all withdrawn their intention to try again. Two startups, Motif Technologies and Trillion Labs, have said they intend to take part in the additional selection. Motif Technologies is highlighting as a strength its experience developing both high-performance LLMs and multimodal models as foundation models. Trillion Labs entered the competition by emphasizing model independence and controllability.
The industry is also offering the view that the competitive landscape of the Dokpamo project could change from the second evaluation. While the first evaluation focused on verifying independence, the second evaluation could be a phase in which each elite team’s technical direction becomes more clearly visible. Another key interest is how much weight multimodal capabilities will carry in the actual evaluation process.
An industry official said, “The second evaluation will be a stage to see in what direction each elite team is growing its model and to check future application plans.”