OpenAI adds 3 real-time voice models to API for translation, transcription and conversation

Generating...

Chi-gyu Hwang

published 2026-05-08 15:39:47

Share this article

OpenAI has added three new voice models to its API with real-time conversation, translation and transcription features, TechCrunch reported on May 8 local time.

The first of the three models OpenAI introduced is GPT-Realtime-2. Unlike the previous model, GPT-Realtime-1.5, it has GPT-5-class reasoning capabilities and can handle complex user requests, OpenAI said. The second is GPT-Realtime-Translate, which provides real-time translation in step with the pace of a conversation. It understands inputs in more than 70 languages and outputs in 13 languages. The third is GPT-Realtime-Whisper, which converts speech to text in real time while a conversation is under way.

OpenAI said, "The models we are launching this time are evolving real-time audio beyond simple question-and-answer into a voice interface that can listen, reason, translate, transcribe and act as the conversation unfolds."

It cited customer service, education, media, events and creator platforms as key use cases. OpenAI said it has built guardrails to prevent misuse such as spam and fraud. It also said it has embedded a mechanism to stop conversations if violations of harmful content guidelines are detected.

All three models are available through the OpenAI Realtime API. Translate and Whisper are billed per minute, while GPT-Realtime-2 is charged based on token consumption.

Chi-gyu Hwang delight@d-today.co.kr

Keyword