ChatGPT’s Images 2.0 Model Excels at Text Generation

Distinguishing between human-made and AI-generated images was once straightforward. Two years ago, image models couldn’t create a menu for a Mexican restaurant without fabricating dishes like “enchuita,” “churiros,” “burrto,” and “margartas.” Now, the latest ChatGPT Images 2.0 model can produce a restaurant menu that appears seamless, though a $13.50 ceviche might raise questions about fish quality.

For a comparison, the output from DALL-E 3 two years ago was notably less polished. AI image generators have struggled with spelling, as they typically utilized diffusion models, which reconstruct images from noise. Asmelash Teka Hadgu, founder and CEO of Lesan AI, explained in 2024 that image generators learn patterns covering more pixels, hence writings on an image are a very small part.

Researchers have explored alternatives like autoregressive models, which predict image appearances more like LLMs. OpenAI did not specify what powers ChatGPT Images 2.0. However, they stated the new model includes “thinking capabilities,” enabling web searches, multiple images from one prompt, and verification of its creations. The model produces marketing assets of various sizes and comic strips. It also has improved understanding of non-Latin text.

“Images 2.0 brings an unprecedented level of specificity and fidelity to image creation,” OpenAI said, allowing for conceptualization and execution with precision—aiding in generating sophisticated images that follow instructions and preserve details at up to 2K resolution.

These features mean image generation isn’t as swift as typing a query but still only requires minutes for complexity. All ChatGPT and Codex users will access Images 2.0, with more advanced options for paid users. The gpt-image-2 API will be available, with pricing based on output quality and resolution.