Agent Dev Loop
If you're a developer (or a coding agent) iterating on a flow, this is your loop. Edit a block, run fsdev run, read the NDJSON, repeat. The CLI runs the same runAction engine the production server uses, against the same stores, with structured events on stdout and runtime logs on stderr. No browser, no HTTP server, no mock harness.
The loop
- Edit. Change a block, sequencer, router, capability, or flow definition.
- Run.
pnpm fsdev run <flow> <action> -i '<json>'from the repo root. Pass--session <id>for multi-turn behavior,--model <id>to swap the model,--seed-session <json|path>to start from specific state. - Read. Stderr shows
[flow-state] *runtime logs — the shape of execution. Stdout streams NDJSON events —item_added,content_delta,state_change,flow_complete,error. Pipe tojqfor anything you want to inspect. - Repeat. Tighten the loop with
--capture <path>if you want a single file to diff between runs.
A worked example, "I'm adding a new tool to chat-agent":
# 1. Edit flows/chat-agent/blocks/my-new-tool.ts and wire it into the pipeline.
# 2. Smoke it.
pnpm fsdev run kitchen-sink chat-agent \
-i '{"message":"use the new tool to do X","mode":"do"}' \
--session new-tool-test \
--capture /tmp/chat-run.json
# 3. Read what happened.
jq -c 'select(.type=="item_added" and .item.kind=="tool_call")' /tmp/chat-run.json
Reading the output
Stderr and stdout are separate channels on purpose. Stderr is for humans and agents skimming the run; stdout is for tools that parse it.
# Final result only
pnpm fsdev run ... 2>/dev/null | jq -c 'select(.type=="flow_complete")'
# All errors
pnpm fsdev run ... 2>/dev/null | jq -c 'select(.type=="error")'
# Just the assistant message text, reconstructed from streamed deltas
pnpm fsdev run ... 2>/dev/null | jq -r 'select(.type=="content_delta") | .delta' | tr -d '\n'
--quiet silences stderr when you only want the NDJSON. --log-level debug adds nested-block events for tracing inside sequencers and routers.
Useful flag combinations
| Flag | What it does | When to reach for it |
|---|---|---|
-i, --input <json> | Inline action input | Every run |
-f, --input-file <path> | Read input from a JSON file | Long fixtures |
-s, --session <id> | Reuse session state across invocations | Multi-turn flows |
--seed-session <json|path> | Pre-populate session state | Reproducing a specific bug state |
--seed-user <json|path> | Pre-populate user-scoped state | User-memory features |
--seed-org <json|path> | Pre-populate org-scoped state | Multi-tenant features |
-m, --model <id> | Override the model for every generator | Cheap iteration, forcing a path |
--flow-dir <path> | Restrict flow discovery (repeatable) | Monorepo with many candidate flows |
--capture <path> | Write the full structured run output to a JSON file (additive with stdout) | Diffing runs, sharing a trace |
--quiet | Suppress stderr runtime logs | Piping NDJSON cleanly |
--log-level <level> | debug | info | warn | error (default: info) | debug to trace inside sequencers |
When to switch tools
fsdev run is the right answer for flow-level changes. It is not the right answer for:
- Pure helpers, types, or schemas — use
pnpm test(orpnpm --filter <pkg> test). Vitest is faster and asserts on values directly. - Component rendering, streaming display, hydration — open the kitchen-sink app in a browser. NDJSON tells you the data is right; only a browser tells you the render is right.
- Diagnosing a failure — switch into the
debug-flowskill. It has a failure-pattern matrix and thefsdev blockisolation workflow for narrowing down which block broke.
The CLI is for verifying a change works. The skill is for figuring out why one doesn't.