Microsoft has launched MAI-Image-2, its second-generation image model, currently ranked #3 on Arena.ai’s text-to-image leaderboard, just behind Google’s Gemini 3.1 Flash and OpenAI’s GPT Image 1.5. Previously reliant on OpenAI’s models for generating images for Bing and Copilot, Microsoft’s own tech now comes directly from its in-house AI Superintelligence team, led by Mustafa Suleyman. Suleyman, previously CEO of Microsoft AI, now focuses solely on leading this team.
MAI-Image-1, released in October 2025, was Microsoft’s first internally developed image generation model, and significantly, MAI-Image-2 advances with insights from photographers and designers, honing in on photorealism, accurate text in images, and high-detail scene generation.
Currently available via the MAI Playground, the model is being integrated into Copilot and Bing Image Creator, with API access open for enterprise and impending broader developer availability through Microsoft Foundry. Meanwhile, their new GB200 compute cluster, based on NVIDIA’s Blackwell technology, is now operational, hinting at future model releases.
Microsoft’s rapid growth in AI models, including previously released voice and text models, contrasts its historically slower product cycles, showcasing enhanced self-reliance with proprietary hardware and infrastructure, moving away from renting from OpenAI.
