How far has Google Nano Banana 2 come? Fact check with five tests

Google 'Nano Banana 2' [Photo: Google Blog]

[DigitalToday reporter Yoonseo Lee] Google’s message around 'Nano Banana 2' is clear. It says it is "faster, more logical, and can create scenes closer to reality from prompts." On Feb. 28 local time, IT outlet TechRadar introduced its own test of whether those claims also hold up in actual results.

The first experiment was close to a triple test of physical logic, material rendering and text accuracy at the same time. The prompt set a transparent glass sphere balanced precisely on the spout of a ceramic teapot, with the phrase 'CLARITY IS KEY' engraved inside in very small silver letters. The model first had to make the physical structure of placing small text inside the sphere work, then calculate how the letters would appear refracted and distorted by the sphere’s curvature. The output kept the tiny text readable, while showing natural distortion along the curved surface. It also maintained a consistent tone in the glass texture and reflections.

The second experiment targeted the 'visual confusion' and 'subject drift' that often occur in complex scenes. The prompt asked for a cinematic mood shot of a steampunk pirate ship sailing through a sea of clouds at sunset, with a hull mixing glossy brass, copper and dark wood materials, and anthropomorphic animal crew members on board. As prompt requirements increase, models can blur the main subject or smear details, but in this test the main elements stayed relatively clear in the scene. Reflections and shading on metal and wood surfaces did not clash awkwardly in a single frame, and the ship’s structure kept a plausible engineering form with lighting and textures arranged together.

The third experiment directly tested 'text' and 'localizing,' often cited as weaknesses of image generation models. It asked for a professional graphic design layout for a new board game, 'The Spice Route,' including a map and legend, and required the legend to include 'gold, silk, saffron' accurately in Japanese notation. It also added a requirement to place a complex central object of ancient spice jars stacked and interlocked, and to keep the visualization explaining gameplay logically consistent across multiple viewpoints. In the output, the Japanese text remained legible without breaking and blended naturally into the layout. The map, legend and objects were organized into a single design system, showing a level of completeness close to an actual board game draft.

The fourth experiment focused on checking how stably spatial logic and texture rendering hold up when subjects from different eras and materials are composed in dynamic motion. The prompt set knights in full plate armor and 1980s-style robots painted with graffiti, battling in a breakdance contest under modern stage lighting on a cobblestone road in front of a medieval castle.

In this scene, the model had to form large-action poses while naturally arranging conflicting elements in a single frame: the metallic texture of the armor, the painted and graffiti textures of the robots, the castle background and modern lighting. The output captured dynamic movement while distances and positions between subjects did not collapse significantly. Material differences also remained relatively clear, including distinct metal highlights and robot surface textures.

The final experiment was effectively a comprehensive test. It asked for a rain-soaked Seattle street rendered 'surreal but photorealistic,' with an observation deck visible in the distance and a convenience store sign and a cafe sandwich board appearing in the scene. It also required 3 characters to remain consistent throughout the scene.

The key to this experiment was local sense in the background and consistency in the foreground. The background needed plausible landmarks and street mood, while the foreground hinged on character consistency and text accuracy. In the output, the spatial composition did not waver significantly. Consistency in spelling and line arrangement held up relatively well even in elements with multiple lines of text such as sandwich boards. Text also remained readable in scenes where details overlapped, including rain-wet pavement, lighting, and signs and placards.

Taken together, the 5 tests suggest Nano Banana 2 is not simply a model that boosts sharpness or style, but is aimed at showing the ability to keep compositions intact to the end in scenes where physical, spatial and text elements are intertwined. A key point appears to be how much it reduces common errors in text rendering and material expression. Still, whether the final output feels like an appealing image can vary by user purpose and preference, so prompt design and iterative adjustments remain important in real-world use.

Yoonseo Lee yslee@d-today.co.kr

Keyword