Skip to main content

Fetch

@flow-state-dev/tools — Fetch a single web page and return its content as clean, LLM-ready markdown.

Why this exists

Agents need to read web pages. Documentation, articles, user-shared links, search results worth reading in full. The raw HTML is noisy and wastes context tokens. tools.fetch handles the fetching, content extraction, and HTML-to-markdown conversion so you don't have to.

Three providers, auto-selected by what's available:

ProviderHow it worksJS renderingAnti-botEnv var
FirecrawlManaged API, best qualityYesYesFIRECRAWL_API_KEY
Jina ReaderHTTP API via r.jina.aiYes (ReaderLM)PartialJINA_API_KEY (optional)
Built-inNode.js fetch + Readability + TurndownNoNoNone needed

The built-in fallback always works. No API keys, no external services. It handles static HTML well — documentation sites, blog posts, articles. For JS-rendered SPAs or pages behind anti-bot protection, you'll want Firecrawl or Jina.

Basic usage

import { generator } from "@flow-state-dev/core";
import { tools } from "@flow-state-dev/tools";

const reader = generator({
name: "reader",
model: "anthropic/claude-sonnet-4-6",
prompt: "Read URLs the user provides and summarize them.",
tools: [tools.fetch()],
});

The LLM calls fetch with a URL and gets back markdown content, a title, and metadata.

Configuration

tools.fetch({
// Force a specific provider instead of auto-detection
provider: "firecrawl", // "firecrawl" | "jina" | "builtin"

// Enable JS rendering (Firecrawl and Jina only)
waitForJS: true,

// Explicit API keys (overrides env vars)
keys: {
firecrawl: "fc-...",
jina: "jina_...",
},
})

All options are optional. With no config, the tool auto-detects providers from environment variables and falls back to built-in.

Provider resolution

The tool checks for available providers in this order:

FIRECRAWL_API_KEY set?  →  Firecrawl (deterministic, JS rendering, anti-bot)
JINA_API_KEY set? → Jina Reader (deterministic, ReaderLM markdown)
Always → Built-in (static HTML only, but always works)

Unlike tools.search(), fetch never throws "no provider available". The built-in fallback covers the zero-config case.

Output shape

Every provider returns the same normalized result:

{
url: "https://example.com/article",
title: "Article Title",
markdown: "# Article Title\n\nClean markdown content...",
metadata: {
statusCode: 200,
contentType: "text/html",
description: "Meta description if available",
publishedDate: "2026-01-15",
wordCount: 1247,
},
source: "firecrawl" // which provider was used
}

How the built-in fallback works

The built-in provider uses a three-step pipeline:

  1. Fetch — standard fetch() with a browser-like User-Agent
  2. Extract@mozilla/readability strips navigation, ads, sidebars, and boilerplate, keeping just the article content (same library behind Firefox Reader View)
  3. Convertturndown converts the cleaned HTML to markdown with ATX-style headings and fenced code blocks

This is the same pipeline Jina Reader uses internally. The difference is Jina also handles JavaScript-rendered pages via their ReaderLM model.

Direct provider constructors

If you want to skip auto-detection and lock to a specific provider:

import { firecrawlFetch, jinaFetch, builtinFetch } from "@flow-state-dev/tools";

// Always use Firecrawl (throws if no API key)
const fetch = firecrawlFetch({ keys: { firecrawl: "fc-..." } });

// Always use Jina (works without key at 20 RPM)
const fetch = jinaFetch();

// Always use built-in (never calls external services)
const fetch = builtinFetch();

A natural pattern: search first, then fetch the best results for full content.

const researcher = generator({
name: "researcher",
model: "anthropic/claude-sonnet-4-6",
prompt: "Search for information, then read the most relevant pages to give thorough answers.",
tools: [tools.search(), tools.fetch()],
});

The LLM will call search, scan the snippets, then fetch the pages that look most useful. You don't need to wire this up — the LLM figures out the workflow.

Error handling

ScenarioBehavior
URL returns 404/500Throws an error. Generator retry handles transient failures.
URL redirectsFollows redirects automatically (standard fetch behavior)
Page has no readable contentReturns best-effort markdown (raw body conversion if Readability can't extract an article)
Firecrawl API errorThrows with the Firecrawl error message. Generator can retry.
Jina rate limited (429)Throws. Generator retry will back off.

Next steps