Naver unveils core technology behind AI Tab

Lee Ki-chang (이기창), a Naver Cloud director, speaks at the 'AI Search Tech Deep Talk' event on July 2. [Photo: Naver]

If traditional search mainly finds documents or information that match a question, AI search focuses on understanding a user's intent and linking it to actions such as shopping, booking and navigation. That is the aim of Naver's conversational AI search service AI Tab, which it officially launched on June 25.

Naver held a Tech Deep Talk session at D2SF in Seoul's Gangnam district on July 2 under the theme, "From discovery to action: Naver AI search built by next-generation AI technology," and disclosed key technologies applied to AI Tab. The technologies introduced were three: the AI Tab-optimised model "Product Native LLM", "harness engineering" to operate AI to fit services, and Smart Lens-based "multimodal" technology.

The need for such technology lies in AI Tab's service structure. While the previously introduced AI Briefing could adjust exposure scope and costs by placing AI answers only on part of search results, AI Tab must provide a conversational answer every time a user selects the tab. For Naver, where tens of millions of people search each day, that structure requires solving for response quality, speed, processing costs and stability at the same time.

Kim Sang-beom (김상범), head of Naver's search platform division, said, "AI Tab is structured so that if you want to see results whenever you ask, it has to show them." He said it would be hard to launch without the ability to handle a huge volume of traffic and some confidence in quality. He added, "Because we cannot spend unlimited costs, the result of thinking about how to do it efficiently is the three things we will introduce today: the model, harness engineering and multimodal."

◆AI model tailored to services... trained on search, purchase and booking flows

Naver applied a lightweight specialised model based on existing HyperCLOVA X (HCX) to AI Tab. Naver calls it a "Product Native LLM". It means the model is designed not just to score highly on general knowledge evaluations but to work well in real search, comparison and booking flows.

Lee Ki-chang (이기창), director of hyperscale AI models at Naver Cloud, said, "The existing HyperCLOVA X was a general-purpose LLM with broad knowledge and reasoning capabilities." He said the next-generation model focuses on continuing multi-turn conversations in long conversational contexts, selecting tools suited to the situation and completing the tasks users want through to the end. He added, "The goal is not to rank first in every benchmark, but to make a model that works best at the moment of actual service."

Model development was carried out along three axes: data, architecture and learning methods. On data, it improved training data quality with document-quality filters, and expanded the training scope beyond previously collected primary and secondary education-level documents to include high-difficulty materials such as court precedents and specialist papers, as well as high real-life utility documents such as product reviews, recipes and game guides.

On architecture, it introduced an MoE (Mixture of Experts) structure. Unlike the existing transformer structure, where computation increased with the square of input length as inputs grew longer, it improved the system so computation is proportional to input length. As a result, response time remained nearly constant up to the 16,000-token range, and operating costs fell because it can handle more requests with the same computing resources.

In the training stage, it increased the share of reinforcement learning. It expanded computing resources to more than double those used for HCX and built a training environment linked to a user simulator and Naver's actual search and booking tools. For example, after a user asks, "Recommend a restaurant in Gangnam with a good atmosphere," and adds conditions such as, "Focus on Sinsa-dong, with a 7 p.m. reservation available for 2 people," the model is trained to sequentially call tools for place search and checking reservation availability.

Training techniques to reduce hallucinations were also applied. It introduced "Clarify RL", which prompts the system to ask for additional conditions rather than answering arbitrarily when it cannot provide an answer. For a question with missing key information such as, "Who is the lead actor in that drama," it is trained to first confirm which drama it is, instead of guessing a specific work and answering. According to Naver, the specialised model applying this technique reduced hallucination rates by up to 30 percentage points compared with HCX, based on Artificial Analysis' AA-Omniscience benchmark.

Naver evaluated model performance in three groups: service, core and specialised capabilities. Service capability reflecting service quality such as search, purchase and booking scored 108 points versus the global peer-model average of 100, while core capability measured by externally certified benchmarks such as instruction following and tool calling scored 104. For specialised capabilities such as GPQA, which covers PhD-level science problems, it set a target of 85 percent of the competitor average and exceeded it. Lee said the strategy is to invest most heavily in service capability, followed by core and then specialised capabilities. It did not disclose the model's specific parameter scale, saying it does not make model size itself a competitive target.

◆A good model is not enough... 'work skills' that link search, shopping and Places

Even if a model is good, a service is not completed with that alone. Because a language model is trained on data up to a certain point and does not know the latest information, it needs to be connected with search infrastructure and service tools. That role is performed by "harness engineering".

Han Seung-gyun (한승균), leader of Naver's AI search service, likened it to AI's "work skills" and defined it as "a technology and operating system that brings out as much of the model's capabilities as possible while making it work to fit service requirements."

When a question comes in, AI Tab determines whether it is a request it can answer safely, organises conversation context and user intent, and then calls the necessary search, shopping and Places tools to construct an answer. It also provides action cards for booking, navigation and purchasing. For example, for the question, "I'm having a team dinner in Jeongja-dong today, so find a restaurant with good parking and reservations," it finds candidate restaurants, checks parking convenience through reviews and then checks reservation availability through a reservation API.

To run this process efficiently, Naver applied a "division-of-labour SLM" structure. It uses smaller models divided by role instead of a single huge model, cutting equipment operating costs for some components by up to 3 times and improving response speed by more than 2 times. The comparison target is not existing general search but the structure used at the initial design stage of AI Tab. Han said it does not mean it is more efficient than existing Naver search, but that it improved by more than 3 times compared with the large model used when first building the AI search service. He said internally it takes about 10 seconds on average for the first answer to be produced, about a twofold improvement from 20 to 30 seconds in the initial design.

Naver cited long-accumulated Korean-language search data and service assets such as blogs, cafes, shopping and Places as sources of this competitiveness.

In a Q&A session, a policy-related question also came up. Asked whether personal information consent was obtained for training data, Han answered, "We use only posts that are fully public and allowed for search, and after internal review we use only writings with no issues." On plans to introduce ads into AI Tab, he answered, "As of now, there are no plans to run ads," and said it prioritises answer reliability.

Asked about results since the official launch, he said users increased by more than 3 to 4 times compared with the beta service and use of shopping and Places action cards also rose. On competitiveness compared with global AI chatbots, he cited as the biggest strength know-how built over a long time by accumulating Korean-language information and search data. Asked whether it would introduce usage limits, he answered, "There are no such plans yet."

◆Smart Lens placed at the front of the search bar... AI search expands via images

The third axis is multimodal technology. Naver presented a direction of placing a Smart Lens button at the front of the mobile search bar and linking image-based search with AI Tab. The goal is for AI to understand the target, mood and context when a user shows a scene in a photo or video, and connect it to search, shopping and booking.

Yoon Sang-doo (윤상두), leader of Naver Future AI Center, said, "People no longer input only text." He said they go beyond asking "What is this?" and demand understanding and execution at once, such as "Find products similar to this" and "Book a place with this kind of atmosphere."

Naver has advanced image search technology since launching Smart Lens in 2017. In 2022, it developed into combined search that inputs images and text together, and in 2025 it advanced to a stage linked to AI Briefing that understands and summarises images. Going forward, it plans to expand into a multimodal agent that understands both image and text conditions and links them to execution, such as, "Book for 4 people this evening at a place in my neighbourhood with an atmosphere similar to the cafe in this video."

The supporting technology is multimodal embedding. It places different types of information, such as images and text, in the same semantic space so AI can understand them together. Naver introduced MuCo (Multi-turn Contrastive Learning), a technology it said was recognised at the global computer vision conference CVPR. It learns multiple successive questions about a single image in an actual conversational flow so it can maintain context without reprocessing the image each time the question changes. Naver said it built a dataset of 35 million items to advance multimodal search and recorded top-level performance compared with competing models on major benchmarks.

Yoon said, "The visual search technology accumulated through Smart Lens is a core technology that makes the eyes through which AI agents see the world." He said there are various directions AI agents can extend, including a visual assistant that understands real-time camera screens, "Computer Use" that directly looks at a screen and performs clicks and inputs, and world models and robotics based on understanding physical space.

Naver plans to more closely integrate AI Briefing and Smart Lens with AI Tab in the third quarter, and to connect its real estate service to AI Tab. It also previewed a Whale browser-only agent and the launch of a health agent within the year.

The model, harness engineering and multimodal technology that Naver disclosed do not operate separately. If the service-optimised model is the brain, harness engineering is the work skills that make that brain use tools such as search, shopping and booking accurately, and multimodal is the eyes that expand input beyond text to images. Naver said it is focusing on shortening the process from users finding the results they want to taking real actions through these three axes.

Hojeong Lee lhj@d-today.co.kr

Keyword