Google’s AI research lab, DeepMind, has announced a new artificial intelligence model called Gemini 2.5 Computer Use. This specialized model is designed to browse the internet and operate software applications in a way that mimics human interaction.

The Gemini 2.5 Computer Use model is now available in a preview phase for developers. They can access it through Google’s AI tools, known as Gemini API, Google AI Studio, and Vertex AI Studio.

This technology allows an AI agent to take control of a graphical user interface. This means it can perform tasks like filling out online forms, clicking buttons, scrolling through pages, and even navigating websites that require a login. Essentially, it can see a screen and take action based on what it sees.

Gemini-2.5-Computer-Use-Model-flow

The process works in a loop. The AI is given a user’s request and a screenshot of the current screen. It then suggests an action, like typing text or clicking a specific button. This action is carried out by the software, a new screenshot is taken, and the cycle repeats until the task is finished.

Google says that Gemini 2.5 has shown impressive results on standard tests that measure how well AI can control web and mobile applications. These benchmarks are from Online MID2Web, WebVoyager, and AndroidWorld.

However, the company also acknowledges the potential risks of an AI that can control a computer. There is a danger of misuse, unexpected behavior, and vulnerability to online scams.

To address this, Google states that it has built safety features directly into the model. It also gives developers the ability to set controls. For example, a developer can program the AI to refuse a risky action or to stop and ask for human confirmation before proceeding with a high-stakes task, like making a purchase.

Leave a comment

Your email address will not be published. Required fields are marked *