Running Policy Tests
What you'll do
Run the Craik policy regression harness — locally and in CI — so the governance contract stays enforced across every change. By the end you'll know what's checked, how to read the report, and how to integrate the gate into your release pipeline.
The release gate
craik policy test
The command prints a JSON craik.policy_test_report. A passing report
exits with status code 0. Any failed check exits non-zero and includes
the violated check's name plus a failure message.
Why a separate gate?
Unit tests verify the runtime behaves; the policy gate verifies the runtime keeps its promises. Different categories of failure — a crashed function vs. a quietly widened authority — should be addressable separately so they're addressable at all.
What the gate checks
Immutable path protection
Writes to declared immutable paths require both approval metadata and a matching repo.write.immutable grant.
Memory updates default to proposals
Direct local memory writes without a memory.write grant are denied; the runtime emits a proposal instead.
Trusted-local fail-open receipts
When the trusted-local profile takes a fail-open path, the runtime seals a receipt with fail_open: true.
Automation stays fail-closed
Automation runs stop on uncertainty rather than widen authority. The stop is itself receipted.
Runner grant boundaries are tracked
Capability grants declared in the policy envelope are the only ones runners may exercise — boundary stays tight as runner adapters land.
Redaction regressions
Receipts, logs, handoffs, and case files all pass through the central redaction guard before persistence — verified on representative payload shapes.
CI usage
Run the policy gate alongside the rest of the validation sweep:
uv run --python 3.12 --extra dev pytest
uv run --python 3.12 --extra dev ruff check .
uv run --python 3.12 --extra dev mypy
uv run --python 3.12 --extra dev craik policy test
In source-tree development specifically:
uv run --python 3.12 --extra dev craik policy test
The Craik CI workflow runs this same command on every PR. If the policy gate fails, the PR cannot merge — by design.
Reading the report
craik policy test --json emits a machine-readable report. A passing
shape:
{
"schema": "craik.policy_test_report",
"version": "0.1.0",
"status": "pass",
"checks": [
{ "name": "immutable_paths", "status": "pass" },
{ "name": "memory_proposal_default", "status": "pass" },
{ "name": "trusted_local_fail_open_receipt", "status": "pass" },
{ "name": "automation_fail_closed", "status": "pass" },
{ "name": "runner_grant_boundary", "status": "pass" },
{ "name": "redaction_regressions", "status": "pass" }
]
}
A failing report names the violated check and an error.message field
with the failure detail. Don't paper over failures — fix the underlying
issue and re-run.
When to extend the gate
Add a check when:
A new capability lands (a new memory write path, a new sandbox backend, a new credential transport). The gate should verify the default posture before merging.
A real incident reveals a missed invariant. Don't fix the bug without adding a regression — the gate is the institutional memory.
A policy profile changes. New profiles need their own pass/fail cases.
Don't add checks to verify "the tests pass" — that's pytest's job. The policy gate is for governance invariants specifically.
State the gate touches
The harness uses local Craik state and may create a pending local
memory proposal inside the selected CRAIK_HOME. This is intentional:
proving the proposal flow works requires creating one. If you're
sensitive about pollution, point CRAIK_HOME at a scratch directory:
CRAIK_HOME=/tmp/craik-policy-test craik policy test