Use Cases

These examples are working hypotheses for Wardwright’s first policy library and test suite. They are intentionally focused on constrained agentic workflows where the operator knows what failure looks like.

Status: these are not finished product claims. They are candidate scenarios for docs-driven development: each one should become a reproducible test, a receipt shape, and eventually a UI view.

Evidence Themes

Public production-agent writeups repeatedly point at the same pain points: retry loops and tool spam, malformed structured output, weak traceability, unclear handoff points, and hard-to-debug failures that look successful at the end of the run.

Useful references:

Escalation Vocabulary

Wardwright should use two different terms:

Where this document says “human escalation” for MVP examples, read it as asynchronous alerting unless the action is explicitly named require_human_approval.

Candidate Policy Examples

Ambiguous Success

An agent says the job is done, but the expected artifact, ticket update, database record, or customer-visible output is missing.

Wardwright policy:

Falsifiable value:

Tool Loop Or Tool Spam

The agent repeats the same tool or provider request without meaningful state change, often consuming budget while appearing active.

Wardwright policy:

Falsifiable value:

Structured Output Boundary

Downstream systems expect JSON, XML, or another machine-readable contract, but the model returns malformed or semantically incomplete output.

Wardwright policy:

Falsifiable value:

Context And Cost Budget

A workflow gradually grows context until smaller models fail, latency increases, or expensive fallbacks become the default.

Wardwright policy:

Falsifiable value:

Prompt Experiment Guardrails

Operators want to test prompt preambles, postscripts, or model variants without turning every agent integration into a bespoke experiment.

Wardwright policy:

Falsifiable value:

Spike Candidates

These are concrete experiments that can become examples, BDD scenarios, and property generators.

Direction Value hypothesis Data needed First test
JSON/XML repair gate Reduces downstream parser and semantic-contract failures. Output buffer, schema/parser errors, retry count. Generate malformed and semantically incomplete outputs; assert retry or block before release.
Session tool-loop detector Reduces repeated tool/provider calls that spend tokens without changing state. Session-scoped tool name, args hash, result hash, status. Generate repeated identical tool facts; assert alert/inject/reroute at threshold.
TTSR deprecated-pattern guard Saves context until a rule matters while preventing known bad output from reaching consumers. Stream ring buffer, trigger offset, one-shot rule state. Generate streams with trigger split across chunks; assert trigger before release.
Async operator alert sink Improves visibility without claiming synchronous human approval. Receipt event, sink status, delivery attempt metadata. Trip a policy; assert receipt event and sink delivery record even if sink fails.
Approval gate Enables true human review for irreversible actions, but requires persistence and timeout semantics. Pending request state, approval token, deadline, resume decision. Simulate approve/reject/edit with timeout and idempotent resume.
Prompt experiment receipts Makes Wardwright useful as a prompt experiment boundary. Prompt transform version, route, outcome labels, latency/cost. Run A/B variants over fixture tasks; assert receipts can group by transform version.
Cost/context budget guard Prevents silent migration from cheap/fast routes to expensive/slow routes. Estimated tokens, route selection, rolling run/session/tenant budget. Generate calls near budget/context thresholds; assert route/degrade/alert decisions.
Trace-to-regression loop Turns production incidents into durable examples. Receipt timeline, policy events, failure label, expected future behavior. Import a labeled receipt; generate a BDD fixture that fails before the policy is added.

Policy Engine Implications

These examples need different data scopes. Structured output can run on the current attempt. Tool-loop detection needs recent events from the same run or session. DOS controls eventually need tenant/user-level windows. Wardwright should therefore make policy state explicit and bounded instead of giving policy code arbitrary access to receipts or storage.

See Policy Engine MVP for the proposed initial phases, state scopes, and action model.