Gemini 2.5 ‘Computer Use’ AI Model Can Navigate Websites Autonomously

Google is rolling out an ambitious new AI model designed to interact with the internet in a strikingly human way. Called Gemini 2.5 Computer Use, this specialized AI can navigate web browsers, click buttons, fill out forms, and even scroll through pages—all based on a simple text prompt. It is a significant step toward creating AI agents that can perform complex digital tasks autonomously. The model can go beyond simple chatbot responses to actively engage with user interfaces.

Built on the capabilities of Gemini 2.5 Pro, this AI model differentiates itself by operating within a virtual browser environment. Unlike some rival AI agents that can access an entire desktop operating system, Google’s model focuses specifically on web and mobile interfaces. This approach allows it to tackle everyday digital chores that previously required human intervention or complex API integrations. Think about an AI filling out a detailed online form, navigating a cluttered website, or adding items to a shopping cart based on a list—all with minimal fuss.

Gemini 2.5 Computer model is Google’s new AI agent

The core of Gemini 2.5 Computer Use lies in an iterative feedback loop. When a user gives the AI a task, the model first receives the request, a screenshot of the current screen, and a history of its previous actions. It then processes this information and proposes a specific UI action, such as clicking a link, typing text into a field, or scrolling down. Client-side code executes the action, the screen updates, and a new screenshot is sent back to the AI. This loop continues until the original task is complete.

Google has optimized this model primarily for web browsers. However, it also shows promise for mobile app control. Internal testing at Google already uses versions of this model for tasks like UI testing, speeding up software development.

Performance and safety in focus

Google claims the Gemini 2.5 Computer Use model “outperforms leading alternatives on multiple web and mobile benchmarks” with lower latency. Demonstrations show the AI competently handling tasks like playing the game 2048 or browsing websites. Interestingly, brief tests even show it solving Google Search CAPTCHAs, a significant hurdle for non-human users.

However, Google is also emphasizing safety. The company is aware of the unique risks associated with AI agents that control computers. Bad actors could incur potential misuse, or even unexpected behavior on the part of the AI ​​could occur. With this in mind, the company has built safety features directly into the model. Developers also receive tools to prevent the AI ​​from performing high-risk actions, such as compromising system security or bypassing CAPTCHAs without explicit user permission.

Currently, Gemini 2.5 Computer Use is available for developers through the Gemini API in Google AI Studio and Vertex AI. It is not yet directly accessible to consumers. That said, this technology paves the way for a future where AI handles more of our routine digital interactions.

android-tech

Passionate about the intersection of technology and society. I break down complex tech news into understandable insights. Focus on AI, cybersecurity, and the future of the web. #TechNews #AI #Innovation

Leave a Reply

Your email address will not be published. Required fields are marked *