Cohere Unveils Open-Source Voice Model for Transcription

Cohere Unveils Open-Source Voice Model for Transcription

2 Min Read

Enterprise AI company Cohere launched its first voice model on Thursday. Transcribe is an open-source automatic speech recognition model designed for tasks like note-taking and speech analysis. With 2 billion parameters, it can be used with consumer-grade GPUs for self-hosting. It supports 14 languages: English, French, German, Italian, Spanish, Portuguese, Greek, Dutch, Polish, Chinese, Japanese, Korean, Vietnamese, and Arabic.

Cohere states that Transcribe outperforms models such as Zoom Scribe v1, IBM Granite 4.0 1B, ElevenLabs Scribe v2, and Qwen3-ASR-1.7B Speech on the Hugging Face Open ASR leaderboard, achieving a word error rate (WER) of 5.42, the lowest on the benchmark.

The company claims an average win rate of 61% over other models in accuracy, coherence, and usability based on human evaluations. However, the model is less effective in transcribing Portuguese, German, and Spanish.

Transcribe can process 525 minutes of audio in a minute, which is high for models in its class. Cohere plans to integrate it into its enterprise agent orchestration platform, North, and offers it through its API for free. It’s also available on Model Vault, Cohere’s managed inference platform.

The popularity of speech recognition models is rising as demand grows for note-taking and dictation apps like Granola and Wispr Flow.

Earlier this year, Cohere reportedly informed investors of generating $240 million in annual recurring revenue in 2025, with CEO Aidan Gomez suggesting a potential IPO in the near future.

You might also like