[DigitalToday reporter Jinju Hong (홍진주)] Google has unveiled Gemini Robotics ER 1.6, a robot-focused artificial intelligence (AI) model that strengthens visual, spatial awareness and physical reasoning capabilities.
An online outlet, Gigazine, reported on Tuesday local time that the new model is designed to understand its surroundings and even call up search tools to carry out tasks.
The core of the model is that it raises a robot’s ability to interpret the physical environment on its own, beyond simply following instructions. Gemini Robotics ER 1.6 has stronger spatial and physical reasoning performance than the previous generation. Improved object-detection accuracy also enables it to handle complex instructions, such as counting objects, identifying the fewest objects among those in view, and finding all items small enough to fit inside a specific cup.
It also includes functions to judge spatial states. It can infer situations such as whether a door is open, and it calls Google Search like a default tool to find information needed during task execution. It can also use vision-language-action models and externally defined functions. The structure allows a robot to act by combining external information with its own perception.
A notable new feature is reading analog gauges. Gemini Robotics ER 1.6 can read analog instruments such as pressure gauges based on visual information. Google said it applied “agentic vision,” which enlarges images to estimate ratios and spacing, and that gauge-reading performance improved significantly from the previous model, Gemini Robotics ER 1.5. This feature began from requirements from Boston Dynamics, which is working with Google.
Its ability to observe safety constraints has also improved. The new model is designed to better comply with physical safety limits, such as not handling liquids or not lifting objects over 20 kg. Its ability to identify hazards in the surrounding area has also increased. This is intended to have the robot reflect not only simple success rates but also action constraints during task execution.
Multi-view reasoning, which interprets footage from multiple cameras together, has also been strengthened. Gemini Robotics ER 1.6 has been improved to more accurately understand relationships between scenes captured by multiple cameras. This is expected to broaden the range of tasks in which it can grasp positions and relationships between targets in complex spaces.
Google stressed that reasoning about the physical world is essential to broaden the scope of robot use. It said that for robots to be truly useful in daily life and industrial settings, following instructions alone is not enough, and the ability to understand the physical world is needed. It also explained that reasoning ability combined with a robot’s senses, including moving through complex facilities and reading a pressure gauge needle, bridges the gap between the digital and physical worlds.