Model Groups
How to use semantic model labels with automatic fallback across providers.
The Problem
Every production AI app needs model fallback. API keys expire, providers go down, rate limits hit. Hardcoding a single model ID means a single point of failure.
Model groups solve this. Instead of model: "gpt-5.4", you write model: "preset/fast" or pass an array of models. The framework resolves to the best available model at execution time, retries on failure, and falls back to the next provider automatically.
Quick Start
import { createModelResolver } from "@flow-state-dev/core/models";
import { generator } from "@flow-state-dev/core";
const resolver = createModelResolver({
presets: {
fast: { models: ["anthropic/claude-sonnet-4-6", "openai/gpt-5.4-mini", "google/gemini-3-flash"] },
},
});
const chat = generator({
name: "chat",
model: "preset/fast",
prompt: "You are a helpful assistant.",
});
"preset/fast" resolves to the first available model in the preset's list. No changes to your generator code — it's a drop-in replacement for any model reference.
Generators also support array fallback directly:
const chat = generator({
name: "chat",
model: ["openai/gpt-5.4", "anthropic/claude-sonnet-4-6"],
prompt: "You are a helpful assistant.",
});
Default Presets
Three built-in presets ship with the framework:
| Preset | Models (preference order) | Defaults |
|---|---|---|
fast | anthropic/claude-sonnet-4-6, openai/gpt-5.4-mini, google/gemini-3-flash | maxTokens: 1024 |
thinking | anthropic/claude-opus-4-6, openai/gpt-5.4, google/gemini-3.1-pro-preview | Anthropic extended thinking enabled |
balanced | anthropic/claude-sonnet-4-6, openai/gpt-5.4, google/gemini-3-flash | None |
The first available model in each list is used. "Available" means the app has an API key for that provider (direct key or gateway).
Provider Detection
The model resolver auto-detects which providers are available by checking environment variables:
| Provider | Environment Variable |
|---|---|
| Anthropic | ANTHROPIC_API_KEY |
| OpenAI | OPENAI_API_KEY |
GOOGLE_GENERATIVE_AI_API_KEY |
If only ANTHROPIC_API_KEY is set and you use "preset/fast", it resolves to anthropic/claude-sonnet-4-6. If that key later fails, it skips to openai/gpt-5.4-mini — which won't be available either, so it moves to google/gemini-3-flash. If nothing works, you get a clear error listing what was tried.
Explicit Keys
Override auto-detection with explicit keys:
const resolver = createModelResolver({
keys: {
anthropic: process.env.MY_ANTHROPIC_KEY,
openai: process.env.MY_OPENAI_KEY,
},
});
Gateways
Gateways are availability multipliers. A single gateway key makes all providers available without needing individual API keys.
Vercel AI Gateway
Zero-config on Vercel deployments. If AI_GATEWAY_API_KEY is set (or auto-provided via Vercel OIDC), all providers are available. Use the vercel/ prefix in model strings to route through the gateway:
"vercel/openai/gpt-5.4" — OpenAI via Vercel gateway
"vercel/anthropic/claude-sonnet-4-6" — Anthropic via gateway
The gateway is auto-detected from AI_GATEWAY_API_KEY even without explicit config. Just deploy to Vercel and it works.
OpenRouter
Uses OPENROUTER_API_KEY.
Priority
Direct API keys take priority over gateways. If you have ANTHROPIC_API_KEY set and a Vercel gateway configured, Anthropic models use the direct key (lower latency, no intermediary). Other providers route through the gateway.
Custom Presets
Override defaults or add new presets:
import { createModelResolver } from "@flow-state-dev/core/models";
const resolver = createModelResolver({
presets: {
// Override built-in
fast: {
models: ["openai/gpt-5.4-nano", "google/gemini-3.1-flash-lite-preview"],
defaults: { maxTokens: 512 },
},
// Add new
coding: {
models: ["anthropic/claude-opus-4-6", "openai/gpt-5.4"],
defaults: { maxTokens: 8192 },
},
},
});
const coder = generator({
name: "coder",
model: "preset/coding",
});
Preset Defaults
Preset defaults set baseline generation config. Caller config always wins:
const resolver = createModelResolver({
presets: {
thinking: {
models: ["anthropic/claude-opus-4-6", "openai/gpt-5.4"],
defaults: {
maxTokens: 4096,
providerOptions: {
anthropic: { thinking: { budgetTokens: 10000 } },
},
},
},
},
});
Provider-specific options are filtered at runtime. If thinking resolves to an OpenAI model, the anthropic provider options are stripped — they won't leak to the wrong provider.
Provider Preference
Presets encode a capability tier — "how capable, how fast." They do not encode a brand choice. If you want to say "I prefer Anthropic across the board," that is an orthogonal axis called provider preference.
Two axes, combined at resolution time:
| Axis | Answers | Set by |
|---|---|---|
| Preset / tier | "How capable?" | Preset author; model: "preset/x" |
| Provider preference | "Which brand, when multiple are available?" | User, flow, block, or skill |
The resolver walks the preset's list, but reorders it first: models from preferred providers come first (in the order you give), the rest come after in their original order. Availability filtering and retry/fallback run on the reordered list.
With createFSDProvider
Per-call preference:
import { createFSDProvider, defaultGroups } from "@flow-state-dev/core/models";
const provider = createFSDProvider({ groups: defaultGroups });
// Default: preset's natural order
provider("balanced");
// Prefer Anthropic; fall back to the rest of the preset if no Anthropic model is available
provider("balanced", { prefer: "anthropic" });
// Ordered preference
provider("balanced", { prefer: ["anthropic", "google"] });
Provider-level default (applies to every call unless overridden):
const provider = createFSDProvider({
groups: defaultGroups,
providerPreference: "anthropic",
});
provider("balanced"); // uses anthropic models first
provider("balanced", { prefer: "openai" }); // call-site wins
provider("balanced", { prefer: [] }); // explicit "no preference"
With createModelResolver
Set the default on the resolver; every "preset/x" string reorders accordingly:
const resolver = createModelResolver({
providerPreference: "anthropic",
});
Reordering example
Preset preset/large = [openai/gpt-5.4, anthropic/opus, google/gemini-3, anthropic/sonnet].
prefer | Order used |
|---|---|
undefined | openai/gpt-5.4, anthropic/opus, google/gemini-3, anthropic/sonnet |
"anthropic" | anthropic/opus, anthropic/sonnet, openai/gpt-5.4, google/gemini-3 |
["anthropic","google"] | anthropic/opus, anthropic/sonnet, google/gemini-3, openai/gpt-5.4 |
Relative order within a provider bucket is preserved — opus stays before sonnet.
Strict mode
By default, preference is a soft preference — if no preferred model is available, the rest of the preset is still tried. Opt in to strict mode for compliance-style use cases ("only ever Anthropic"):
provider("balanced", { prefer: "anthropic", strict: true });
Strict mode throws when no model from the preferred providers is available (either because the preset has none, or because the key/gateway for those providers is missing). The error message names the preset and the preferred providers.
Precedence
Highest wins:
- Call-site
{ prefer }onprovider(...) createFSDProvider({ providerPreference })(orcreateModelResolver({ providerPreference }))- Nothing — preset's natural order (today's behavior; fully backward-compatible)
Call-site prefer is an override, not a merge. An empty array prefer: [] explicitly means "no preference" — it does not re-inherit the provider-level default.
Dynamic preference (per-input or per-user)
Use the existing model: (input, ctx) => ... callback form, not a new mechanism:
const chat = generator({
name: "chat",
model: (input, ctx) =>
provider("balanced", { prefer: ctx.user.state.preferredProvider }),
});
The framework has one dynamism mechanism. Call sites that need per-request preference read user state (or input, or any other source) and pass the result explicitly.
Introspection
provider.explain(groupName, options?) returns the ordered candidate list with availability status, plus the model the resolver would choose. Useful for debugging and for building UI selectors.
provider.explain("balanced", { prefer: "anthropic" });
// {
// preset: "balanced",
// prefer: ["anthropic"],
// candidates: [
// { modelId: "anthropic/claude-sonnet-4-6", providerName: "anthropic",
// available: true, source: "key" },
// { modelId: "openai/gpt-5.4", providerName: "openai",
// available: true, source: "gateway", gateway: "vercel" },
// { modelId: "google/gemini-3-flash", providerName: "google",
// available: false, reason: "no-key-no-gateway" },
// ],
// willUse: "anthropic/claude-sonnet-4-6",
// }
Retry and Fallback
The fallback behavior is configurable:
const resolver = createModelResolver({
retryPolicy: {
maxAttemptsPerModel: 3, // default: 2
baseDelayMs: 500, // default: 1000
maxDelayMs: 15000, // default: 10000
},
});
When a model call fails:
- If the error is retryable (429, 500, 502, 503, network errors), retry the same model with exponential backoff
- After
maxAttemptsPerModelretries, move to the next model in the list - Non-retryable errors (auth failures, bad requests) skip directly to the next model
- If all models are exhausted, throw with a summary of every error
Streaming
Streaming uses a simpler fallback: if a stream fails before yielding its first chunk, the next model is tried. Mid-stream failures propagate to the caller — there's no way to transparently resume a stream from a different model.
Model String Format
Model strings use slash format:
| Format | Example | Description |
|---|---|---|
provider/model | "openai/gpt-5.4" | Direct provider |
gateway/provider/model | "vercel/openai/gpt-5.4" | Via gateway |
preset/name | "preset/fast" | Built-in preset |
Introspection
Check what's available at runtime:
resolver.presets(); // ["fast", "thinking", "balanced"]
resolver.available("fast"); // ["anthropic/claude-sonnet-4-6", "openai/gpt-5.4-mini"]
available() returns only the models in a preset that have a working provider configured.
Dynamic Model Selection
Use a function for model to pick presets based on input:
const adaptive = generator({
name: "adaptive",
model: (input, ctx) => {
return input.needsReasoning
? "preset/thinking-small"
: "preset/small";
},
});
Relationship to Model Resolver
createModelResolver handles both model resolution and presets in a unified API:
- Model strings like
"openai/gpt-5.4"are resolved to concrete AI SDK model instances - Presets like
"preset/fast"resolve through the preset's model list with built-in fallback - Array fallback like
["openai/gpt-5.4", "anthropic/claude-sonnet-4-6"]tries models in order
Zero-config usage auto-detects providers from environment variables:
const resolver = createModelResolver();