OpenAI releases ChatGPT Image 2.0 as AI evolution targets posters and UI

OpenAI unveiled ChatGPT Image 2.0. [Photo: OpenAI]

OpenAI has unveiled a new image generation model for ChatGPT called Image 2.0.

On April 21, major foreign media outlets including tech news site TechCrunch reported that the Image 2.0 model significantly improves on the limits of earlier image generators, particularly in handling text. Diffusion-model-based image AI has often shown spelling errors or awkward sentence layouts. As recently as 2 years ago, generating a restaurant menu image often produced nonexistent dish names or incorrect spelling. Image 2.0 is assessed as producing output that is hard to distinguish from human-made results even for the same request.

OpenAI did not disclose detailed information about the model architecture in a briefing. It said it introduced "reasoning capabilities" to the new model, enabling web search, multi-image generation and result verification. That also makes it possible to create marketing drafts in various sizes and comics composed of multiple scenes.

Its text processing scope has also expanded. Understanding and rendering accuracy for non-Latin scripts including Japanese, Korean, Hindi and Bengali improved significantly. As a result, Korean-language users are also expected to find it more useful for work where text and layout accuracy matter, such as posters, notices and UI drafts. The model's knowledge cutoff is December 2025, which could reduce accuracy for requests reflecting the latest news or events.

In a press release, OpenAI emphasized that Image 2.0 provides an "unprecedented level of specificity and fidelity". It also said it can faithfully implement small text, icons, UI elements, complex compositions and detailed style constraints at up to 2K resolution.

These performance gains come with a trade-off in generation speed. It is not able to produce output instantly, but it can generate complex images consisting of multiple panels within minutes.

The industry is also watching a shift in the technical direction of image generation AI. In 2024, Asmelash Teka Hadgu (아스멜라시 테카 하드구), CEO of Lesan AI, explained that diffusion models focus on learning overall pixel patterns rather than small elements such as text in an image because they reconstruct inputs. Researchers have since been experimenting with autoregressive image generation models similar to large language models.

OpenAI also expanded access. From April 22, all ChatGPT and Codex users can use Image 2.0, and paid users can generate higher-quality output. It is also providing a 'gpt-image-2' API, with pricing depending on output quality and resolution.

That is expected to expand Image 2.0 beyond a simple image generation tool into areas where text accuracy and fine control are critical, including document-style images, marketing drafts and UI mockups.

Seung-a Yoo ysah@d-today.co.kr

Keyword