web_fetch — single-page apps, sites behind logins, dynamic dashboards, file uploads, or workflows that require multiple steps. For static page content, the lighter web_fetch tool is faster and cheaper.
Browser Actions
The browser tool supports 16 actions, verified againstBROWSER_TOOL_ACTIONS in packages/skills/src/builtin/platform/browser-tool-schema.ts:
| Action | What It Does |
|---|---|
status | Check if the browser is running or stopped |
start | Launch the browser |
stop | Close the browser |
profiles | Manage isolated browser profiles |
tabs | List all open tabs |
open | Open a new tab with a URL |
focus | Switch to a specific tab (by targetId) |
close | Close a tab (by targetId) |
snapshot | Get the page accessibility tree (aria or ai format) |
screenshot | Capture a page screenshot |
navigate | Go to a URL in the current tab |
console | View browser console output |
pdf | Save the current page as a PDF |
upload | Upload a file to the page |
dialog | Handle browser dialogs (alerts, confirms, prompts) |
act | Interact with page elements (11 sub-kinds) |
Page Actions
navigate -- Go to a URL
navigate -- Go to a URL
screenshot -- Capture a screenshot
screenshot -- Capture a screenshot
Take a screenshot of the current page. The image is returned as a message attachment.
Screenshots are useful for visual verification — checking how a page looks, confirming form submissions, or capturing error states.
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | "screenshot" |
targetId | string | No | The tab to capture (defaults to the active tab) |
fullPage | boolean | No | Capture the full scrollable page (not just the viewport) |
type | string | No | Image format: "png" (lossless) or "jpeg" (compressed) |
snapshot -- Get the accessibility tree
snapshot -- Get the accessibility tree
Get a structured representation of the page content using the accessibility tree. This is how agents “read” web pages.
Format options:
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | "snapshot" |
targetId | string | No | The tab to snapshot (defaults to the active tab) |
snapshotFormat | string | No | "aria" for the raw accessibility tree, or "ai" for an AI-optimized format |
mode | string | No | Snapshot mode: "efficient" for compact output optimized for LLM consumption |
refs | string | No | Element reference type: "role" (ARIA role references) or "aria" (ARIA label references) |
interactive | boolean | No | Show only interactive elements |
compact | boolean | No | Use compact output format |
depth | number | No | Maximum tree depth to include |
maxChars | number | No | Maximum characters in the snapshot output |
- aria — The full accessibility tree with all ARIA roles and properties. Detailed but verbose.
- ai — A condensed format designed for AI consumption. Shows interactive elements (buttons, links, inputs) with labels and references that can be used with the
actaction.
ai format is recommended for most use cases — it gives agents the information they need to interact with the page without overwhelming them with structural details.act -- Interact with page elements
act -- Interact with page elements
Interact with elements on the page. The
The
act action supports 11 different interaction types (called “sub-kinds”), each designed for a specific type of interaction.| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | "act" |
request.kind | string | Yes | The interaction type (see table below) |
request.* | (varies) | (varies) | Additional parameters depend on the kind |
act action wraps its interaction parameters inside a request object. See the Interaction Types section below for all 11 sub-kinds and their parameters.open -- Open a new tab
open -- Open a new tab
Open a new browser tab and navigate to a URL.
SSRF validation applies to the URL (same as
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | "open" |
targetUrl | string | No | The URL to open in the new tab |
navigate).pdf -- Save page as PDF
pdf -- Save page as PDF
Save the current page as a PDF document, returned as a message attachment.
Useful for archiving web pages or generating printable versions of online content.
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | "pdf" |
upload -- Upload files
upload -- Upload files
Upload one or more files to a file input element on the page.
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | "upload" |
inputRef | string | No | CSS selector or aria reference for the file input element |
paths | string[] | No | Array of file paths to upload |
dialog -- Handle browser dialogs
dialog -- Handle browser dialogs
Respond to browser dialogs such as JavaScript alerts, confirmation prompts, and input prompts.
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | "dialog" |
accept | boolean | No | Whether to accept (true) or dismiss (false) the dialog |
promptText | string | No | Text to enter in a prompt dialog |
console -- View console output
console -- View console output
Retrieve the browser’s developer console output, including errors, warnings, and log messages.
Useful for debugging page issues or monitoring JavaScript errors.
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | "console" |
Interaction Types (act sub-kinds)
When using theact action, the kind parameter determines what type of interaction to perform. There are 11 interaction types:
| Kind | What It Does | Key Parameters |
|---|---|---|
click | Click an element | ref (element reference or CSS selector), doubleClick, button, modifiers |
type | Type text character by character | ref, text, submit, slowly |
press | Press a keyboard key | key (e.g., "Enter", "Tab", "Escape") |
hover | Hover over an element | ref |
drag | Drag from one element to another | startRef, endRef (both element references) |
select | Select a dropdown option | ref, values (array of option values) |
fill | Fill form fields (clears existing content first) | ref, fields (array of field objects) |
resize | Resize the browser viewport | width, height |
wait | Wait for a condition | timeMs (milliseconds), textGone (text to wait to disappear) |
evaluate | Run JavaScript in the page | fn (JavaScript code) |
close | Close the current page or dialog | (no additional parameters) |
Selecting Elements
Most interaction types require aref parameter to identify which element to interact with. You can use:
- CSS selectors — Standard CSS selectors like
#login-button,.submit-btn, orinput[name="email"] - Aria references — References from the
snapshotoutput inaiformat, which label interactive elements with short identifiers
Browser Profiles
Profiles create isolated browser contexts with separate cookies, local storage, and session data. This is useful for:- Managing multiple accounts on the same website
- Testing with different user states (logged in vs. logged out)
- Keeping browsing sessions separate between different tasks
profiles action to create, list, and switch between browser profiles.
Stealth mode (anti-bot detection)
By default the browser runs headless Chrome — fine for most public sites and internal tools, but instantly flagged by modern bot-detection services (Cloudflare Turnstile, FingerprintJS, BrowserScan, reCAPTCHA v3 scoring, Reddit’s secondary fingerprint check). For agents that hit those sites you can install Comis with progressively stronger stealth. Three install-time flags compose together. They’re available in both the bare-metal installer (install.sh) and the Docker image (build args):
| Flag | What it adds | When you need it |
|---|---|---|
--with-browser | Stock Google Chrome + headless shared libs. Sandbox ReadWritePaths widened for Chrome’s out-of-profile writes (~/.config/google-chrome, ~/.local/share/applications). | Baseline. The browser tool needs a Chrome binary to launch; this provisions one if you didn’t bring your own. |
--with-xvfb | Adds Xvfb + a comis-xvfb.service systemd companion that runs a virtual display on :99. The main daemon unit joins its /tmp namespace so the X11 socket is reachable. Config seeded with headless: false. | Sites that detect headless mode itself (BrowserScan, DataDome-tier, Cloudflare managed). On the test VPS, BrowserScan flipped from Robot to Normal just by switching to headed mode via Xvfb. |
--with-cloakbrowser | Installs CloakBrowser — a stealth Chromium fork with source-level fingerprint patches at the C++ level (canvas, WebGL, audio, fonts, GPU, screen, WebRTC, hardware reporting). findChrome() auto-detects and prefers it over stock Chrome. Sandbox paths are tighter than the Chrome variant. | Sites that fingerprint visitors. Verified bypass on Cloudflare Turnstile (non-interactive), FingerprintJS, BrowserScan, bot.incolumitas (1 fail vs 4 fails for stock Chrome — only the irreducible W3C WebDriver-spec leak remains), and Reddit’s secondary fingerprint check on non-datacenter IPs. |
Verified results (Ubuntu 24.04 VPS, head-to-head against the same probes)
| Config | bot.incolumitas detection-tests | browserscan.net verdict |
|---|---|---|
| Stock Chrome, headless | 4 fails (UA leak, HEADCHR_UA, WEBDRIVER, CHR_MEMORY) | Robot |
| Stock Chrome, headed via Xvfb | 1 fail (WEBDRIVER spec only) | Normal |
| CloakBrowser, headless | 1 fail (WEBDRIVER spec only) | Normal |
| CloakBrowser + Xvfb, headed | 1 fail (WEBDRIVER spec only) | Normal |
WEBDRIVER fail is unavoidable for any CDP-connected browser — it’s a W3C-spec observable side effect, not a fingerprint defect. CloakBrowser’s own documentation calls this out as the single irreducible cost.
Install examples
findChrome() probes ~/.cloakbrowser/chromium-*/chrome first, then platform-specific Chrome paths. Whatever was installed wins.
License note
CloakBrowser’s wrapper is MIT-licensed; the compiled binary is under a separate license — free for self-hosted use (including bringing it into your own VPS / Docker deploys), but bundling into a service you distribute to third-party customers requires a separate OEM license from CloakHQ. See the CloakBrowser binary license for the full terms.Security
The browser includes built-in security protections:- SSRF validation — The
navigateandopenactions validate URLs before loading them. The browser cannot access localhost, internal IP addresses (like10.x.x.xor192.168.x.x), or cloud metadata endpoints (like169.254.169.254). This prevents attacks where a crafted message tricks the browser into accessing your internal network. - Screenshot sanitization — Screenshots are processed and sanitized before being stored or sent.
End-to-end example: scraping a SPA
A typical multi-step workflow — visit a JavaScript-rendered dashboard, log in, navigate to a data view, and pull a value the agent can act on. The agent strings together navigate, snapshot, act, and screenshot in sequence:message tool’s attach action.
For sites that emit network calls you want to inspect, the agent can pair the browser with console (to see JS logs) and act kind: evaluate (to run JavaScript like document.querySelector(...).innerText). For high-throughput scraping that does not need rendered JavaScript, prefer web_fetch.
Related
Built-in Tools
All built-in tools including browser
Web Tools
Web search and page fetching tools
Agent Tools Overview
See all available agent tools
Config Reference
Browser and tool configuration options
