Skip to main content

Web Scraper

A purpose-built fetch tool, separate from generic http_request. It exists because the agent doesn't want raw HTML - it wants the article.

What it does

Fetches a URL
Strips boilerplate (nav, ads, footer, scripts)
Returns clean text the agent can reason over

Guardrails

Caps response at 1 MB - large pages get truncated
20-second timeout - slow servers don't stall the conversation
Subject to proxy and URL-guard rules

What it's good for

Reading articles, blog posts, docs pages, GitHub READMEs without the noise
Following up on a Web Search result
Summarising a single page on demand

See also

Web Search - Find URLs to feed into the scraper
Token Compression - What trims long pages before they hit the model

What it does
Guardrails
What it's good for
See also