Synthetic Model Composition

Wardwright synthetic models are stable public model IDs backed by a route graph, not aliases for one upstream provider. The first route primitives are inherited from Calciforge’s model gateway work and are deliberately small:

Two outside ideas define the shape of this work. XBOW’s “model alloy” writeups show that alternating multiple LLMs inside one agent context can outperform a single-model monoculture on agentic search tasks. oh-my-pi’s Time Traveling Streamed Rules show that rules can sit outside the prompt until model output actually triggers them, then abort and retry with a targeted reminder.

References:

Dispatcher

Dispatchers are for request-shape routing. The current implementation uses estimated prompt tokens and declared context windows. A small request can use a local model; a larger request promotes to a bigger-context model without the caller changing the public model name.

{
  "route_root": "fit-dispatcher",
  "targets": [
    {"model": "local/qwen", "context_window": 32768},
    {"model": "managed/kimi", "context_window": 262144}
  ],
  "dispatchers": [
    {
      "id": "fit-dispatcher",
      "models": ["local/qwen", "managed/kimi"]
    }
  ]
}

Expected behavior:

Policy Control

The route graph is the baseline model definition. Route policy runs before provider selection and can narrow or override that baseline:

Built-in declarative route gates and Dune-backed policy snippets can both emit these actions. WASM remains fail-closed until the runtime is enabled.

Cascade

Cascades are reliability plans. They preserve declaration order and skip impossible targets before a provider attempt is made.

{
  "route_root": "local-then-remote",
  "cascades": [
    {
      "id": "local-then-remote",
      "models": ["local/qwen", "managed/kimi", "managed/reserve"]
    }
  ]
}

Expected behavior:

Alloy

Alloys are for composition when constituents are useful as one synthetic behavior. Wardwright supports ordinary compatible-window alloys and a deliberate partial-context mode for the user-facing “local plus long-context” use case.

{
  "route_root": "local-kimi-partial",
  "alloys": [
    {
      "id": "local-kimi-partial",
      "strategy": "deterministic_all",
      "partial_context": true,
      "constituents": ["local/qwen", "managed/kimi"]
    }
  ]
}

Expected behavior:

Weighted alloys model stochastic composition while keeping test runs reproducible inside the pure route planner. Caller-supplied request metadata must not control provider selection in the serving path.