Skip to main content

Browser & Computer Control

When the agent needs to use your machine the way a person would - open a page, screenshot it, click a button, type a phrase - these tools are how it does it.

Browser

Open a URL in an embedded webview the agent can read back from
Screenshot the current page
Inspect image output and metadata, so the agent can describe what it sees

The browser surface runs through CEF (Chromium Embedded Framework).

Computer (mouse + keyboard)

Mouse - move, click, drag
Keyboard - type text, send key chords
Human path - moves and clicks follow human-like trajectories, not teleporting

What it's good for

Driving sites that don't have an API or a native integration
Multi-step UI flows where a single screenshot isn't enough
Automating local apps from inside a chat

See also

Web Scraper - When you only need the article, not the whole page

Browser
Computer (mouse + keyboard)
What it's good for
See also