Skip to main content
Version: MVP

Running Policy Tests

4 min readFor release engineersUpdated 2026-05-19

What you'll do

Run the Craik policy regression harness — locally and in CI — so the governance contract stays enforced across every change. By the end you'll know what's checked, how to read the report, and how to integrate the gate into your release pipeline.

The release gate

Run the policy gate
craik policy test

The command prints a JSON craik.policy_test_report. A passing report exits with status code 0. Any failed check exits non-zero and includes the violated check's name plus a failure message.

Why a separate gate?

Unit tests verify the runtime behaves; the policy gate verifies the runtime keeps its promises. Different categories of failure — a crashed function vs. a quietly widened authority — should be addressable separately so they're addressable at all.

What the gate checks

Immutable path protection

Writes to declared immutable paths require both approval metadata and a matching repo.write.immutable grant.

Memory updates default to proposals

Direct local memory writes without a memory.write grant are denied; the runtime emits a proposal instead.

Trusted-local fail-open receipts

When the trusted-local profile takes a fail-open path, the runtime seals a receipt with fail_open: true.

Automation stays fail-closed

Automation runs stop on uncertainty rather than widen authority. The stop is itself receipted.

Runner grant boundaries are tracked

Capability grants declared in the policy envelope are the only ones runners may exercise — boundary stays tight as runner adapters land.

Redaction regressions

Receipts, logs, handoffs, and case files all pass through the central redaction guard before persistence — verified on representative payload shapes.

CI usage

Run the policy gate alongside the rest of the validation sweep:

The full pre-PR sweep
uv run --python 3.12 --extra dev pytest
uv run --python 3.12 --extra dev ruff check .
uv run --python 3.12 --extra dev mypy
uv run --python 3.12 --extra dev craik policy test

In source-tree development specifically:

Source-tree run
uv run --python 3.12 --extra dev craik policy test

The Craik CI workflow runs this same command on every PR. If the policy gate fails, the PR cannot merge — by design.

Reading the report

craik policy test --json emits a machine-readable report. A passing shape:

passing report
{
"schema": "craik.policy_test_report",
"version": "0.1.0",
"status": "pass",
"checks": [
{ "name": "immutable_paths", "status": "pass" },
{ "name": "memory_proposal_default", "status": "pass" },
{ "name": "trusted_local_fail_open_receipt", "status": "pass" },
{ "name": "automation_fail_closed", "status": "pass" },
{ "name": "runner_grant_boundary", "status": "pass" },
{ "name": "redaction_regressions", "status": "pass" }
]
}

A failing report names the violated check and an error.message field with the failure detail. Don't paper over failures — fix the underlying issue and re-run.

When to extend the gate

Add a check when:

  1. A new capability lands (a new memory write path, a new sandbox backend, a new credential transport). The gate should verify the default posture before merging.

  2. A real incident reveals a missed invariant. Don't fix the bug without adding a regression — the gate is the institutional memory.

  3. A policy profile changes. New profiles need their own pass/fail cases.

Don't add checks to verify "the tests pass" — that's pytest's job. The policy gate is for governance invariants specifically.

State the gate touches

The harness uses local Craik state and may create a pending local memory proposal inside the selected CRAIK_HOME. This is intentional: proving the proposal flow works requires creating one. If you're sensitive about pollution, point CRAIK_HOME at a scratch directory:

Sandboxed policy test
CRAIK_HOME=/tmp/craik-policy-test craik policy test

What's next