OpenAI announced that its API now features new voice intelligence capabilities for developers to create apps that can converse, transcribe, and translate user interactions. The company’s latest model, GPT‑Realtime‑2, simulates natural vocal interactions with users and incorporates GPT‑5‑class reasoning for handling intricate requests, a step up from the earlier GPT-Realtime-1.5.
Additionally, GPT‑Realtime‑Translate offers real-time translation that keeps up with user conversations, supporting over 70 input languages and 13 output languages. The new transcription feature, GPT-Realtime-Whisper, provides live speech-to-text functionality during interactions.
“These models transform real-time audio engagement from mere call-and-response to sophisticated voice interfaces that listen, reason, translate, transcribe, and react as conversations evolve,” OpenAI stated.
These updates are valuable for companies enhancing customer service and serve varied sectors like education, media, and creator platforms. OpenAI has implemented safeguards to prevent misuse of the new features, such as spam or fraudulent activities, by embedding triggers that halt harmful conversations per their content guidelines.
Included in OpenAI’s Realtime API, the Translate and Whisper models charge by the minute, while GPT-Realtime-2 charges based on token use.
