French AI company Mistral has unveiled a new open-source text-to-speech model called Voxtral TTS. Released on Thursday, this model is designed for use by voice AI assistants and in enterprise scenarios like customer support, putting Mistral in competition with companies such as ElevenLabs, Deepgram, and OpenAI. Voxtral TTS supports nine languages: English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic.
Pierre Stock, VP of Science Operations at Mistral AI, shared with TechCrunch that the company’s customers were requesting a speech model, prompting them to develop a compact version that functions on devices like smartwatches, smartphones, laptops, and other edge devices. Despite its smaller size, the model offers advanced performance at a significantly lower cost than alternatives on the market.
Mistral’s new model can create a custom voice with just a sample of less than five seconds, capturing details such as accents, inflections, intonations, and speech irregularities. Based on the Ministral 3B framework, it can seamlessly switch between languages without compromising voice characteristics, ideal for dubbing or real-time translation. The company aimed for the model to produce human-like, non-robotic sounds.
Optimized for real-time functionality, the model has a 90ms time-to-first-audio for a 10-second, 500-character input and a real-time factor of 6x, allowing it to generate a 10-second clip in around 1.6 seconds.
Previously, Mistral introduced transcription models tailored for large batch processing and real-time operations with low latency. The new speech model suggests a move towards offering a comprehensive suite of voice solutions for enterprises.
Stock explained that Mistral’s strategy involves creating an end-to-end platform capable of processing multimodal input streams, including audio, text, and images, with corresponding outputs. The advantage lies in the increased information available through an integrated agentic system utilizing audio inputs and outputs.
Mistral’s open-source approach and customization capabilities give it an edge over rivals, enabling enterprises to personalize the models according to their needs.
