AI Model Replicates Real-Time Gameplay of 1993’s Doom via Hallucination

AI Model Replicates Real-Time Gameplay of 1993’s Doom via Hallucination

AI Model Replicates Real-Time Gameplay of 1993’s Doom via Hallucination


### GameNGen: A Look Ahead at AI-Driven Video Games

On Tuesday, scientists from Google and Tel Aviv University unveiled a revolutionary AI model called **GameNGen** that can interactively replicate the iconic 1993 first-person shooter game *Doom* in real time. This advancement utilizes AI image generation methods inspired by **Stable Diffusion**, a widely-used neural network model for producing images. GameNGen marks a major advancement in the field of real-time video game creation, potentially leading to a future where games are conceived by AI rather than merely programmed.

#### The Idea: AI as a Game Development Tool

In traditional video game design, graphics are produced through intricate algorithms and pre-established protocols. However, GameNGen introduces a transformative concept: rather than depending on classical rendering methods, an AI engine could “envision” or hallucinate the graphics in real time. This methodology may alter the landscape of game development and engagement, creating a novel framework in which the AI fabricates each frame as a predictive challenge.

Nick Dobos, an application developer, captured the enthusiasm around this advancement by remarking, “The possibilities here are insane. Why manually write intricate rules for software when AI can process every pixel for you?”

#### The Mechanics of GameNGen

GameNGen is capable of generating new frames of *Doom* gameplay at more than 20 frames per second using a solitary Tensor Processing Unit (TPU), a specialized processor tailored for machine learning tasks. In trials, human evaluators found it difficult to differentiate between authentic *Doom* gameplay and the AI-generated sequences, accurately identifying the real gameplay footage only 58% to 60% of the time.

The system employs a modified version of Stable Diffusion 1.4, an image synthesis diffusion model released in 2022. The researchers trained a reinforcement learning agent to engage with *Doom*, capturing its gameplay sessions to construct a training dataset. This data was subsequently utilized to develop the specialized Stable Diffusion model, enabling it to anticipate the next gaming state based on previous ones while being guided by player actions.

Nonetheless, the model faces several obstacles. The pre-trained auto-encoder in Stable Diffusion compresses 8×8 pixel segments into 4 latent channels, which leads to artifacts impacting finer details, especially in the bottom bar HUD. Furthermore, achieving “temporal coherence,” or maintaining visual consistency over time, poses a considerable challenge. The researchers tackled this by introducing varying degrees of random noise to the training data and training the model to rectify this noise, assisting it in preserving the quality of the generated environment over longer durations.

#### The Larger Context: Advancements in Neural Rendering

GameNGen is part of a larger movement towards what could be termed “neural rendering.” Nvidia CEO Jensen Huang forecasted earlier this year that most video game graphics could be produced by AI in real time within the next five to ten years. GameNGen builds upon earlier developments in this area, including World Models (2018), GameGAN (2020), and Google’s own Genie model (2024), among others.

The notion of “world models” or “world simulators” is also gaining momentum, with AI video synthesis models such as Runway’s Gen-3 Alpha and OpenAI’s Sora exploring similar avenues. For example, OpenAI recently showcased Sora simulating *Minecraft*, marking another milestone in the journey toward AI-crafted interactive environments.

#### Constraints and Considerations

Although GameNGen is a notable advancement, it has significant limitations. The model was trained exclusively on *Doom*, a game that already exists. Like other Transformer-based models, Stable Diffusion is adept at mimicking but struggles with original generation. Additionally, GameNGen can only reference three seconds of history, requiring it to make probabilistic assumptions about previous game states when revisiting a *Doom* level, potentially resulting in inaccuracies.

Expanding this method to more intricate settings or varying game genres will introduce new hurdles. The computational demands for executing similar models in real time may be prohibitive for widespread usage in the near future. However, the prospect of future gaming consoles equipped with dedicated “neural rendering” processors remains a possibility.

#### The Evolution of Game Production

GameNGen serves as a proof-of-concept highlighting a new approach to video game development. Currently, games are scripted by humans, but the creators of GameNGen foresee a future where games are viewed as “the weights of a neural model, not lines of code.” This could pave the way for a reality where new video games are generated through textual descriptions or image examples instead of traditional coding methods.

Envision the ability to transform a collection of still images into a new playable stage or character for an existing title, all based on examples rather than programming expertise. While this remains speculative, the potential is vast.

#### Conclusion

GameNGen provides an exciting preview of the future of video games, where AI