OpenAI develops voice model that adjusts responses in real time during conversation

Generating...

Chi-gyu Hwang

published 2026-03-06 16:17:50

Share this article

OpenAI is developing a two-way voice model that adjusts responses in real time during a conversation, The Information reported on Wednesday.

ChatGPT's advanced voice mode works on a "turn-based" system. The model processes speech and generates a response only after a user finishes speaking.

If a user makes a brief reaction mid-conversation, such as "uh-huh" or "OK", the model stops speaking and treats it as a separate utterance. Once response generation begins, it cannot change the content in real time. That is why it can feel awkward and rigid compared with talking to a person.

The model under development is called "BiDi" (two-way). It works by continuously processing the speaker's voice and immediately adjusting the response even when an interruption occurs.

In customer support, if a user interrupts during a return process and changes their mind to an exchange, the existing model stops and appears confused. BiDi can grasp the context and continue the conversation naturally with an exchange process, The Information said.

BiDi is expected to be installed in voice-based devices such as smart speakers. OpenAI is reported to be considering developing a smart speaker.

BiDi is not yet complete. After a conversation continues for several minutes, the prototype can malfunction or start speaking in an abnormal voice, The Information reported, citing sources. OpenAI initially aimed to launch BiDi in the first quarter, but it could slip to after the second quarter, the report said.

Chi-gyu Hwang delight@d-today.co.kr

Keyword