DeepMind has unveiled Genie 3, a model for world creation that can generate fully interactive, three-dimensional environments in real-time based on text and image inputs. Think of it as an immediate video game that operates at a resolution of 720p and 24 frames per second, sustaining these specifications for several minutes at a stretch.
Recently, there has been a rapid evolution in large language models (LLMs), including Google’s Gemini and OpenAI’s ChatGPT. This progress has enabled developers to refine agent-based AI systems which act as foundational components for tools such as Genie 3. However, at present, OpenAI lacks any offerings that can match Genie 3’s capabilities in world-building.
In some respects, the developers at DeepMind are revisiting their roots, particularly concerning AI training tools. During a segment of the “Google AI: Release Notes” podcast, DeepMind’s CEO Demis Hassabis remarked, “The emergence of the thinking models is somewhat reminiscent of our foundational gaming projects like AlphaGo and AlphaZero,” alluding to the company’s earlier focus on agent-based models. In the first case, an AI competitor managed to defeat a world champion in the board game Go, while the latter involved an AI quickly mastering chess, shogi, and Go without any human intervention.
Although LLMs have significantly advanced, platforms such as ChatGPT remain relatively basic when