Earlier this week, Google DeepMind released Gemini Robotics-ER-1.6, a new vision and language model to help robots make sense of their surroundings. To show off its capabilities, Boston Dynamics—which has an agreement to use Gemini in its humanoid robots—published a video of its robot dogs using the model to read a thermometer during an inspection of an industrial facility.
Despite the eye-catching demos, Google’s new robotics model only notched incremental gains over previous models in terms of its ability to tell when it had finished a мувофиқи меъёрҳои Google бо истифода аз канали ягонаи камера. Аммо ҳангоми гирифтани якчанд каналҳои камера, модел беҳбудиро нишон дод. That’s important, Google says, because many robotics setups today, such as those in factories or warehouses, use multiple camera views like an overhead camera and a camera mounted on the robot’s arm. The robot must be able to use all of those cameras to create a coherent understanding of what it’s doing and know when the task is complete.