Policy Engine MVP

Wardwright’s policy system should be designed from concrete agent failures, not from a general-purpose scripting fantasy. The first product question is:

What is the smallest policy model that can prevent, repair, reroute, or make visible the failures we actually expect in constrained agent workflows?

The current prototype has a small built-in request policy engine. The next step is to define the durable execution model that can support request, route, stream, output, and history-aware policies without adding unbounded overhead to every synthetic model call.

The live BEAM implementation now has a first policy-plan boundary: Wardwright.Policy.Plan evaluates request-phase governance rules and emits a transformed request plus policy actions, route constraints, alerts, and block state. This is still interpreted from the runtime config, not a fully compiled artifact, but it deliberately moves policy semantics out of the HTTP router so compiled plans, traces, and projections can share one execution contract.

Design Bias

Start with declarative and built-in policies for common cases. Add Starlark as the first advanced portable language only after the execution phases, state scopes, and action model are stable.

Policy language is less important than the policy ABI:

Action Contract

Request-phase policy engines now normalize returned actions through Wardwright.Policy.Action before the router, receipts, or UI see them. The current contract is wardwright.policy_action.v1; engine-level results use wardwright.policy_result.v1.

Every normalized action includes:

Receipts expose decision.policy_actions in that shape and summarize detected same-key ordered conflicts in decision.policy_conflicts. This is intentionally not a UI-specific format; it is the backend projection surface that the workbench, simulation traces, and future AI-assisted authoring flow should consume. Route constraints are still applied in policy declaration order.

The user experience should be visual, conversational, and simulation-first. The deterministic policy artifact is the storage and review format, not the primary interface. Operators should be able to describe the behavior they want, let an AI-assisted authoring flow draft a rule, inspect the compiled policy graph, run generated simulations, and approve the exact artifact before activation.

Initial Use-Case Matrix

Use case Policy phase Required data Required state Actions
Ambiguous success output, final response text, structured fields, expected artifact metadata current call alert, block final, annotate receipt
Structured output repair output stream, final buffered output, parser/schema errors current attempt retry with correction, block, annotate
Deprecated pattern / TTSR response stream bounded stream window, rule match offsets current attempt withhold, rewrite, retry with reminder
Tool loop / tool spam request, route, final tool call name, args hash, result hash, route attempts run/session inject reminder, reroute, alert, stop
Context/cost budget request, route, final token estimate, provider, usage, latency run/session, optional tenant budget route, require approval, stop, alert
Prompt experiment guardrails request, final prompt transform version, route, output verdict aggregate receipts, not hot path annotate, compare, rollback candidate
Abuse/DOS request, route caller identity, rate, concurrent runs, tokens tenant/user/agent windows rate-limit, reject, degrade, alert

MVP should target the first five. Prompt-experiment analytics and DOS controls should influence the state interface, but they do not need full implementation before the policy engine becomes useful.

Execution Phases

Policies should run at explicit phases. Each phase receives only the data it needs unless the synthetic model asks for more.

  1. request.received
    • Inputs: normalized request, caller context, metadata, estimated tokens.
    • Useful for prompt transforms, request guards, budget checks.
    • No model output available.
  2. route.selecting
    • Inputs: request facts, candidate routes, provider health, scoped counters.
    • Useful for route gates, budget-aware routing, model capability checks.
    • May return route overrides or approval requirements.
    • Current implementation supports restrict_routes, switch_model, and reroute as planner constraints recorded in receipts as policy_route_constraints.
  3. response.streaming
    • Inputs: normalized stream events plus bounded ring buffer.
    • Useful for time-travel stream rewriting, regex/literal guards, structured partial parsing, early stop/retry.
    • Must have explicit latency and memory budgets.
  4. output.finalizing
    • Inputs: full or bounded final output, schema/parser verdicts, attempt metadata.
    • Useful for structured output repair, ambiguous success detection, final block/alert decisions.
  5. receipt.finalized
    • Inputs: final receipt, policy actions, usage, route attempts.
    • Useful for sinks, analytics, experiments, and non-blocking alerts.
    • Should not mutate the already-returned user result.
    • Asynchronous human/operator alerts live here unless an earlier phase explicitly returned a blocking approval action.

Time-Travel Stream Rewriting

TTSR is best treated as a stream-phase policy mode, not a separate feature. The operator configures a bounded holdback window:

stream_policy:
  mode: buffered_horizon
  holdback_bytes: 4096
  max_added_latency_ms: 250
  rules:
    - id: deprecated-client
      match:
        contains: "OldClient("
      action:
        type: retry_with_reminder
        reminder: "Do not use OldClient. Use NewClient instead."

The consumer receives data only after it has passed the holdback horizon. If a rule fires before release, Wardwright can drop, rewrite, retry, or escalate without letting the violating output reach the consumer.

MVP guardrails:

The exact term “time-travel stream rewriting” does not appear to be a common public product category. The underlying design space is visible in existing guardrail systems:

Wardwright should adapt the best parts of those systems while making the promise more explicit:

TTSR maps most closely to buffered_horizon plus actions such as retry_with_reminder, rewrite_chunk, drop_chunk, block_final, and alert. The receipt must say which mode was active, how many bytes/tokens were held back, whether violating bytes were released, and which retry or rewrite action fired.

The current BEAM prototype implements a narrow stream-policy foothold: selected-target provider chunks can be checked against literal or regex rules before release, rewritten or dropped, blocked before any SSE bytes are sent, or retried with a reminder while preserving the failed attempt as unreleased receipt evidence. Test-only mock chunks still exist as an escape hatch, but the router can now call the selected provider through Wardwright.ProviderRuntime for each stream attempt and retry the provider boundary before releasing bytes. Ollama NDJSON streams and OpenAI-compatible SSE streams are parsed from native HTTP streaming transport messages before policy evaluation. Receipts record the stream policy action, trigger count, trigger events, retry count, per-attempt provider status, and generated/released/held/rewritten/blocked byte counts.

Split chunk boundaries are treated cautiously. Terminal block/retry checks and rewrite checks inspect the buffered stream window, so a pattern such as OldClient( is still caught if a provider emits Old and Client( in separate events. Non-native streaming provider fallbacks are treated as one held response unit rather than artificially chunked output.

Rules may opt into bounded horizon semantics by setting horizon_bytes or holdback_bytes. When every active stream rule declares a horizon, the evaluator can release UTF-8-safe safe prefixes while retaining the recent overlap window needed to detect split literal or regex matches. If any active stream rule omits a horizon, the evaluator keeps the older full-buffer behavior so an unbounded rule cannot accidentally miss a cross-chunk trigger.

Rules may also declare max_hold_ms to bound how long the oldest unreleased bytes may remain in the stream policy window. Wardwright applies the strictest declared budget across active stream rules. If the budget is exceeded before the held bytes are released, the attempt fails closed with stream_policy_latency_exceeded, cancels the provider attempt, and records max_hold_ms plus max_observed_hold_ms in the stream-policy receipt.

The evaluator now exposes the same behavior as an incremental arbiter: initialize state for a ruleset, consume one normalized provider chunk, receive the newly releasable output chunks for that step, and finish the stream to flush the held suffix. This gives the router a deterministic policy boundary for a runtime implementation that sends SSE while continuing to enforce the holdback window.

The HTTP route now uses that boundary for live streaming. Provider transports emit normalized chunks into the router as they arrive. Bounded-horizon policies can release safe prefixes, keep the recent match window held, and cancel the provider attempt when a later trigger fires. If the response has already started, the client receives a terminal Wardwright SSE event and the receipt records the policy status. If no bytes have been sent, Wardwright can still fail closed with JSON.

Remaining stream-runtime work includes richer raw provider-event offsets in receipts, provider-specific pools, and a clearer operator-facing model for retries after any bytes have reached the client.

State Scopes

Some policies need history, but not all history belongs in every policy call. Policy definitions should declare state requirements so Wardwright can avoid unnecessary tracking.

Scope Examples MVP?
attempt current request/output, stream window, parser state yes
run retries, selected routes, repeated model attempts yes
session repeated tool-call hashes across a user session yes
caller_agent per-agent rolling error/cost counters later
caller_user per-user budget and abuse windows later
tenant tenant-wide rate/cost/DOS control later
global marketplace abuse, fleet-level anomaly detection no

MVP state should be limited to attempt, run, and session. Those scopes support loop detection, retry limits, TTSR, structured repair, and ambiguous success without turning Wardwright into a full observability warehouse.

That limit is a starting boundary, not a claim that all valuable policy state is session-local. Some future rules may never need data outside the current session; others may be explicitly about peer sessions, caller-level abuse, fleet-wide model failures, or tenant budget windows. Policy definitions should make that scope choice explicit so the runtime can provision the right history surface and the UI can explain the privacy, latency, and consistency tradeoff.

State API Shape

Policies should not query arbitrary storage. They should receive named, precomputed facts and scoped counters declared in the model definition.

Example:

state_requirements:
  - id: recent_tool_hashes
    scope: session
    kind: rolling_counter
    key: "tool:{name}:{args_hash}:{result_hash}"
    ttl_seconds: 1800
    max_entries: 256
  - id: retry_count
    scope: run
    kind: counter
    key: "route_retry:{synthetic_model}"

Policy code then sees:

{
  "state": {
    "recent_tool_hashes": {
      "current_count": 3,
      "window_seconds": 1800
    },
    "retry_count": {
      "current_count": 1
    }
  }
}

This keeps policy deterministic and makes cost visible. It also lets the UI explain why a synthetic model has higher overhead.

History Access And Metadata

Policy code will sometimes need data outside the current request. Tool-loop detection needs recent tool attempts in the same run or session. Budget control may need rolling token spend for a session, agent, user, or tenant. DOS controls eventually need broader caller windows.

There are three possible designs:

  1. Expose the general Wardwright read API to policy code.
  2. Expose a bounded policy facts API backed by receipts, counters, and indexes.
  3. Expose a Wardwright-API-shaped query facade backed only by configured policy caches and ring buffers.

The third approach is probably the best ergonomic compromise. Policy authors can use query shapes that resemble the ordinary Wardwright read API, but the policy runtime only serves data that the synthetic model explicitly declared as part of its hot working set.

Direct access to the full historical GET surface is attractive because it is flexible, but it creates hot-path problems:

Instead, model definitions should declare the facts or recent-record caches they need. Wardwright can then decide what to track, how to index it, and how to expose it to the policy engine.

The preferred BEAM hot-path shape is:

Policy enforcement should not normally scan every active session table through the catalog. That fanout is useful for operator browsing, simulation, migration checks, and small bounded experiments, but it makes request latency scale with active session count. Cross-session enforcement rules should instead consume a declared aggregate/index maintained when events are recorded.

This also gives policy authors a clearer mental model:

The UI should surface these distinctions before a rule is enabled.

Policies that enforce behavior need a stronger cache contract than policies that merely annotate receipts. Wardwright should distinguish two classes:

MVP should use deterministic policy working sets for enforcement. If a record is inside the configured window, it must be visible to the policy query. If it is outside any configured eviction dimension, it must not be visible. No approximate LRU, probabilistic summaries, background-lag surprises, or “maybe still in cache” behavior should affect enforcement decisions.

Deterministic eviction can still be configurable, but the semantics need to be simple:

That gives policy authors a predictable rule: inside every configured bound means visible; outside any configured bound means invisible. If that proves too hard to guarantee efficiently, MVP should prefer fewer supported eviction modes over best-effort behavior in enforcement paths.

Example:

policy_context:
  cache_mode: explicit
  facts:
    - id: session_recent_tool_calls
      source: receipts
      scope: session
      select:
        event_type: tool_call.finished
        fields: [tool_name, args_hash, result_hash, status]
      window:
        max_age_seconds: 1800
        max_items: 128
    - id: run_retry_count
      source: counters
      scope: run
      key: "retry:{synthetic_model}"
    - id: tenant_token_budget
      source: counters
      scope: tenant
      key: "tokens:{tenant_id}"
      window:
        max_age_seconds: 86400
  recent_records:
    - id: session_receipts
      api: receipts
      scope: session
      eviction:
        deterministic: true
        order: newest_first
        max_items: 50
        max_age_seconds: 1800
        max_bytes: 262144
      max_items: 50
      max_age_seconds: 1800
      fields:
        - receipt_id
        - synthetic_model
        - final.status
        - decision.selected_model
        - final.events

The policy input receives only the declared facts and recent-record handles:

{
  "facts": {
    "session_recent_tool_calls": [
      {"tool_name": "browser", "args_hash": "h1", "result_hash": "r1", "status": "ok"},
      {"tool_name": "browser", "args_hash": "h1", "result_hash": "r1", "status": "ok"}
    ],
    "run_retry_count": {"value": 2},
    "tenant_token_budget": {"used": 812340, "limit": 1000000}
  },
  "recent": {
    "session_receipts": {
      "available": 14,
      "max_items": 50,
      "max_age_seconds": 1800
    }
  }
}

Advanced policy engines may get a constrained query primitive, but it should be served from these declared caches/ring buffers and bound by scope, result limit, time window, and fields. The call can look like a normal Wardwright query without being backed by unbounded historical storage.

For example:

ctx.receipts.list(
    scope = "session",
    where = {"event_type": "tool_call.finished", "tool_name": "browser"},
    limit = 20,
    max_age_seconds = 1800,
)

That is different from giving Starlark arbitrary access to /v1/receipts. The query is deterministic, scoped, authorized, served from a bounded cache, and visible in the synthetic model’s overhead estimate. If a policy asks for data outside the configured cache, Wardwright should return an explicit “not available” policy fact rather than silently scanning durable history.

For observability-only caches, Wardwright may later allow approximate or best-effort eviction. Those caches must be labeled as such and should not be available to actions that change model behavior.

Action Model

Initial actions should be small and composable:

Actions should be phase-limited. For example, block_final is legal during output.finalizing, while reroute is legal during route.selecting or after a failed stream/output attempt, but not after bytes have already been released in pass-through mode.

alert and require_human_approval must not be treated as synonyms:

MVP Feature Set

The first real policy engine should ship with:

  1. request-phase built-ins for literal/regex match, metadata predicates, and prompt transform injection
  2. route-phase built-ins for context-window, retry-count, and budget predicates
  3. stream-phase buffered horizon with literal/regex match and retry/escalate
  4. final-output JSON and XML validation with retry-or-block
  5. scoped state for attempt, run, and session
  6. receipt events for every policy trigger and action
  7. UI visibility into required state, policy phase, action, latency, and whether output was released to the consumer

Starlark should initially target the same ABI and scopes. If a use case cannot be expressed with the ABI, that is a signal to adjust the ABI, not to give the policy language direct storage or network access.

Rule Composition And Arbitration

Governance rules should not be one unordered bag of effects. Wardwright should separate rule evaluation from action arbitration:

  1. Detectors inspect phase inputs and emit proposed actions. Literal, regex, parser, route, metadata, counter, and cache checks can usually run in parallel because they are pure reads.
  2. Arbiters combine proposed actions into a single deterministic phase decision. Any action that mutates a request, route, stream, retry state, or final output must pass through arbitration.

Every rule should declare an effect set:

id: no-deprecated-client
phase: response.streaming
match:
  regex: "OldClient\\("
mode:
  type: buffered_horizon
  holdback_bytes: 4096
action:
  type: retry_with_reminder
  reminder: "Do not use OldClient. Use NewClient instead."
  max_retries: 1
once_per:
  scope: session
effects:
  reads: [stream.window, session.triggered_rules]
  writes: [attempt.retry, request.system_reminder]
priority: 50

Validation should classify rule interactions before activation:

The UI should surface those classes directly. If a policy can run detectors in parallel and arbitrate safely, it should say so. If a policy depends on order, priority, or conflict resolution, the user should see that before activation.

Policy State Machines

State machines should be a first-class authoring shape, not only an internal runtime concern. Many governance behaviors are clearer when described as named states, event-driven transitions, guard conditions, and entry/exit actions:

The product shape should be a deterministic state-machine artifact that can be edited directly by advanced users, constructed visually by most users, and reviewed through simulations. It should not require users to write BEAM process callbacks for normal policy work.

Example shape:

state_machine:
  id: deprecated-client-stream-guard
  scope: run
  initial: streaming
  states:
    streaming:
      on:
        stream.rule_matched:
          when:
            rule_id: deprecated-client
          transition: retrying
          actions:
            - retry_with_reminder
    retrying:
      budgets:
        max_retries: 1
      on:
        stream.rule_matched:
          transition: blocked
          actions:
            - block_final
        stream.completed:
          transition: completed
    blocked:
      terminal: true
    completed:
      terminal: true

That authoring model is different from arbitrary programmable policy:

It is still compatible with BEAM runtime machinery. The compiler may choose to interpret a small machine inside a policy arbiter, compile a long-lived session/run machine into a gen_statem process, or generate a pure Gleam core for transition selection and have Elixir own the process lifecycle. Those are implementation choices behind the artifact. The activation validator should not accept arbitrary gen_statem callback modules as the default policy surface, because raw process code is much harder to diff, sandbox, simulate, and visualize. A future expert mode could import or attach code-backed machines, but only if they expose the same transition graph, effect declarations, simulator hooks, timeouts, and receipt trace spans.

The authoring UI should therefore treat state machines as a structured policy builder:

This gives policy authors a local way to reduce complexity without forcing the entire governance system into one monolithic workflow engine.

AI-Assisted Authoring

Wardwright should include a policy-authoring assistant that uses an operator- selected backing model to help draft, explain, review, and refine governance rules. This assistant is not the runtime policy engine. It proposes artifacts; the compiler, validator, simulator, and human review path remain authoritative.

The assistant should support:

Because the assistant may use the user’s configured provider credentials, every assistant run should make model choice and data sharing explicit:

The storage artifact should be deterministic YAML or TOML that can be reviewed, diffed, signed, and activated. Advanced users can edit it directly, but normal users should work through the assistant, graph, simulator, and review UI.

Simulation And Generated Tests As UX

Simulation should be a first-class policy authoring surface, not only a CI test. For each draft rule, Wardwright should generate examples and counterexamples that make the policy’s promise visible.

The current dependency choice keeps StreamData in the test profile, but it is worth an explicit production experiment. StreamData-style constrained generators could be useful for simulation when the operator wants bounded, reproducible scenario space: regex near-misses, chunk boundaries, cache-window edges, metadata combinations, and policy conflict cases. That is different from live-LLM scenario generation, which is better for realistic language and unexpected adversarial phrasing. Wardwright should compare three generator shapes before promoting any test library into production:

The product constraint is that generated scenarios must be explainable, replayable, and pin-able. If a production generator cannot provide stable seeds, clear shrink/counterexample output, and bounded runtime, it should remain a development tool rather than part of the policy-authoring UI.

For TTSR rules, generated cases should include:

The UI should show generated checks as user-readable evidence:

Users should be able to pin a generated counterexample as a regression fixture. That creates a direct loop: describe policy, compile artifact, simulate, inspect counterexample, revise, and activate only when the behavior matches intent.

Code-First Policy Visualization

Programmable policy does not automatically make simulation harder. It makes pre-execution explanation harder unless the host exposes a constrained policy API and enough trace data to connect source code to runtime behavior.

Wardwright should evaluate two authoring MVPs in parallel:

  1. Structured primitives first: policy authors compose built-in detectors, counters, stream guards, route switches, and arbiters. The UI can visualize the rule graph before simulation because the policy shape is explicit.
  2. Starlark-first / code-first: policy authors write small deterministic policy functions against the same ABI. The UI visualizes syntax structure, source spans, execution traces, scenario deltas, and opaque branches instead of pretending it can statically understand arbitrary code perfectly.

The Starlark-first UI should project code into a policy-shaped graph:

Simulation then overlays execution evidence onto that projection:

Implementation options should be chosen for the layer being tested:

The decisive comparison is not expressiveness. Programmable policy will always be more expressive. The product question is whether a technical policy author can predict, review, and debug behavior faster with structured primitives or with code plus AST/trace visualization.