Wardwright Vision

Wardwright is a synthetic model platform for agentic systems. A caller asks for a stable model name, such as coding-balanced or wardwright/json-extractor; Wardwright decides what that name means today, records why, and exposes the result through an OpenAI-compatible interface.

Current status: this is a product vision and implementation plan, not a finished release. The repo is intentionally using docs-driven development: write down the contract, build prototypes against it, test the contract, then choose the production foundation using evidence.

Product Shape

In the finished product, Wardwright should let an operator define synthetic models as shareable artifacts:

The key distinction is that the public model name is not just an alias for one provider model. It is a governed contract.

Wardwright has two major inspirations. XBOW’s model-alloy idea motivates synthetic models that can combine multiple LLMs into one coherent agent-facing behavior instead of treating “model” as a single provider ID. oh-my-pi’s Time Traveling Streamed Rules motivate output-stream governance that starts with zero prompt-context cost, then aborts and retries only when the model begins to emit a triggering pattern. See Synthetic Model Composition for the route primitives and source links.

Requirements Direction

The first product audience is technical policy authors: people who can reason about model/provider behavior, receipts, rules, and rollout risk, but who should not need to be backend developers to author or review policy. Nontechnical authoring help remains valuable, but it should be delivered through guided assistant workflows, summaries, simulation, and visualization rather than by weakening the underlying governance model.

Near-term product priority is correctness plus policy authoring quality. That means Wardwright should optimize for:

The first coherent policy set should cover:

Policy authoring should remain UI- and config-centered. Natural language, assistant review, visual graphs, simulation, and deterministic YAML/TOML/JSON artifacts are the primary workflow. Direct snippets are appropriate when they map closely to a rule or runtime hook, but arbitrary programmable policy should be evaluated as an advanced authoring mode, not assumed to be the default.

The major open product question is the primary policy authoring model. Wardwright should run comparable spikes for structured policy primitives and code-first Starlark policy. Both spikes must use the same realistic policy scenarios so the decision is based on authoring clarity, simulation quality, review safety, and debugging speed rather than raw expressiveness.

Request Flow

1. Caller sends a normal OpenAI-style chat completion request.
2. Wardwright resolves the synthetic model name and captures caller provenance.
3. Policy can transform the prompt, pick a route, validate stream events, retry, alert, or stop.
4. Provider receives only the concrete request Wardwright has chosen to send.
5. Receipt records the route, policy actions, timing, usage, and final outcome.

Early Policy Demos

The strongest demonstrations are not generic chatbot filters. They are constrained workflows with known failure modes:

Security policies still matter, especially around prompt injection and leakage, but they are one family of governance examples rather than the product’s whole identity.

Integration Strategy

Wardwright should work in two deployment shapes:

The preferred public namespace is flat when Wardwright owns the whole model catalog and prefixed, such as wardwright/coding-balanced, when Wardwright is one provider inside a larger gateway.

Implementation Direction

The repository initially kept Go, Rust, and Elixir backend prototypes alive so they could be measured against the same contract. That comparison has now served its purpose. The current implementation direction is Elixir plus Gleam on the BEAM: Elixir owns supervision, process boundaries, HTTP, LiveView, provider IO, telemetry, PubSub visibility, and dynamic runtime behavior; Gleam owns correctness-heavy pure decision logic where typed data and exhaustive pattern matching can prevent policy bugs. Session/model processes should publish receipt, policy, queue, and simulation events over Phoenix PubSub so LiveView and other cluster nodes can observe runtime behavior without directly mutating another node’s hot session state once a deployment has explicit node discovery and clustering configured. The Go and Rust backends remain in git history as evidence, but they are no longer carried in the live tree. Systems such as LiteLLM, TensorZero, and Helicone remain useful integration targets and sources of product ideas, but Wardwright should not begin life as a long-running fork of any of them.

The first durable implementation should prioritize:

  1. a small, clear OpenAI-compatible gateway surface
  2. ETS-backed runtime state plus file-backed durable receipt storage and queryability, with Mnesia or SQL stores introduced when they buy concrete replication, query, concurrency, or deployment behavior
  3. PubSub-backed visibility for LiveView and cluster projections, while authoritative session mutation remains owned by one runtime subtree
  4. policy hooks for request, route, stream, and output phases
  5. a policy-engine contract with explicit state scopes and two execution tiers: Dune-backed BEAM snippets for trusted local policy, and WASM/sidecar/hosted execution for externally shared or untrusted policy
  6. a LiveView-first UI that exposes model definitions, live behavior, receipts, policy outcomes, simulations, and policy-shape explanations

See Use Cases for the first policy examples that should drive tests, docs, and UI workflows.