OpenAI has recently launched three groundbreaking realtime voice models aimed at improving the functionality of voice applications for developers. Each model is specifically crafted for unique purposes, such as reasoning, translation, and transcription.
### Overview of OpenAI’s New Voice Models
The trio of voice models recently released by OpenAI includes:
1. **GPT‑Realtime‑2**: This model is the first to utilize GPT‑5-class reasoning, allowing it to handle intricate inquiries and sustain a natural conversational flow.
2. **GPT‑Realtime‑Translate**: A real-time translation model capable of converting spoken language from more than 70 input languages into 13 output languages on the fly.
3. **GPT‑Realtime‑Whisper**: A live speech-to-text model that transcribes spoken words in real-time as they are spoken.
### Detailed Features
#### GPT‑Realtime‑2
This model is uniquely crafted for real-time voice interactions. It flows seamlessly through conversations while reasoning through requests, addressing corrections or interruptions, and delivering contextually relevant replies.
#### GPT‑Realtime‑Translate
The translation model accommodates a wide range of languages, enabling real-time speech translation. With the ability to process 70 input languages and translate into 13 output languages, it seeks to ease communication across linguistic divides.
#### GPT‑Realtime‑Whisper
This transcription model emphasizes low-latency speech-to-text conversion. It captures audio instantaneously, making it perfect for applications that need prompt feedback, like live captions or meeting notes that adjust to current dialogues.
### Pricing Structure
All three models are available via OpenAI’s Realtime API, with the following pricing:
– **GPT‑Realtime‑2**: $32 for every 1 million audio input tokens ($0.40 for cached input tokens) and $64 for each 1 million audio output tokens.
– **GPT‑Realtime‑Translate**: Set at $0.034 per minute.
– **GPT‑Realtime‑Whisper**: Priced at $0.017 per minute.
### Testing and Implementation
Developers can try out the new realtime voice models in the OpenAI Playground. Those with Codex setup can seamlessly incorporate GPT‑Realtime‑2 into existing applications or develop new ones.
For additional details about OpenAI’s latest voice models and their uses, further information can be accessed [here](https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/).
