DeepL, the Cologne-based translation company renowned for its text tools, has introduced a comprehensive voice product suite for meetings, conversations, group settings, and enterprise integration via an API. At a live demo in Seoul, delays of one to two sentences were observed, and DeepL’s CPO noted the ongoing challenge of word order differences between languages.
DeepL’s new suite, DeepL Voice-to-Voice, offers real-time spoken translation for business communication and supports over 40 languages, including all official EU languages and others like Vietnamese, Thai, Arabic, Norwegian, Hebrew, Bengali, and Tagalog. The product addresses four scenarios: virtual meetings, mobile and web conversations, group settings for frontline workers, and enterprise applications through an API.
The suite’s components vary in availability stage. Voice for Conversations, allowing translation over mobile and web without app installation, is generally available now. Voice for Meetings, compatible with Microsoft Teams and Zoom for native language speaking with simultaneous translation, opens for early access in June.
The Voice-to-Voice API, for embedding translation into customer-facing applications like call centers, is in early access. A feature, Spoken Terms, for customizing vocabulary, company, and personal names, will be generally available on May 7.
DeepL’s CEO, Jarek Kutylowski, highlighted the launch as reaching a new frontier in translation, allowing natural language communication without interpreters’ friction or cost. DeepL emphasizes its voice technology as an enterprise tool rather than consumer-focused, ensuring no customer data is used for model training and no data is stored post-call, catering to regulated industries.
The system operates via a three-step process: speech to text, translation using DeepL’s engine, and converting to speech. DeepL claims superior text translation models, which enhance voice output quality.
A study by Slator showed 96% of linguists favoring DeepL Voice for its fluency and accuracy over Google Meet, Microsoft Teams, and Zoom’s solutions. However, Chief Product Officer Gonzalo Gaiolas acknowledged a demonstration at DeepL Connect Seoul on April 15 revealed a one-to-two sentence delay.
Gaiolas pointed out delays due to different language structures but plans for latency reduction. By 2026, DeepL aims to release a feature preserving speakers’ original voices in translations.
Entering a competitive market, DeepL faces rivals like Sanas, which uses AI to alter accents, Camb.AI for media dubbing, and Palabra for real-time speech translation maintaining voice characteristics. Google, Microsoft, and Zoom offer their meeting translation features, but DeepL’s focus remains on superior translation quality, a key differentiator against established platforms.
