Version: MVP

Single-Agent Execution Loop

5 min readCore conceptUpdated 2026-05-19

What you'll learn

The six bounded phases that make up the v0.1 single-agent loop.
The safety boundaries Craik checks before each step.
How runner output becomes typed run output and (sometimes) memory proposals.
The recovery contract a resuming agent must honor.

Single-agent execution loop

The v0.1 contract that lets one runner work through a governed task without depending on an untracked chat transcript. The runner does the reasoning; Craik owns the durable boundary — run state, policy checks, receipts, step outputs, memory proposals, and recovery context.

Lifecycle

A run starts as a craik.task_run with status pending and phase plan. The executor moves through bounded steps:

Phase

Status it produces

Purpose

plan

→ running

Decide the next bounded action under the intent lock.

act

→ running

Perform an approved side effect or propose one.

observe

→ running

Capture output, diagnostics, receipts, and artifacts.

evaluate

→ running

Decide whether to stop, continue, block, or fail.

continue

→ running

Advance to another bounded iteration when needed.

stop

→ terminal

Finalize state, handoff, receipts, and recovery context.

Terminal statuses are completed, blocked, failed, and interrupted. Interrupted runs preserve enough local state for inspection and later recovery.

Safety boundaries

Before each step, Craik enforces three checks. None of them are advisory — violating any of them halts the loop.

Intent lock

If a step would trigger a configured stop condition, the run halts before the runner receives another request.

Capability grant

Side effects (shell, file write, memory write) need a matching grant. Denied side effects produce denial receipts and block the run.

Iteration ceiling

max_iterations bounds the loop. Reaching the bound interrupts the run instead of continuing indefinitely.

Outputs & memory

Runner output is captured as a craik.runner_step_result, then persisted as a redacted craik.run_output. Run outputs can create reviewable craik.memory_proposal records — they cannot write durable facts directly.

Run-created proposals link back to the run for audit:

Field

Type

Purpose

task_id

The task this proposal came out of.

run_id

The specific run inside the task.

step_result_id

The runner step that produced the observation.

handoff_id

id (optional)

The handoff that closed the run, when one exists.

evidence

evidence_reference[]

Pointers back to the captured run output and step result.

Blocked and failed steps are still inspectable, but they do not create memory proposals. Only completed and partial step results may propose, and only when the executor supplies explicit proposal specs.

Fixture vs live runners

Fixture execution

Uses FixtureStepRunner and deterministic step statuses.
For local tests, docs, and executor contract checks.
No credentials, no external side effects, byte-stable.
What CI exercises on every PR.

Live runner execution

Uses adapter-specific boundaries (Codex / Claude / Gemini today).
Same loop contract applies — only the runner backend differs.
Provider-specific details stay under structured, redacted metadata.
May require stricter grants depending on capability surface.

Recovery

Recovery starts by inspecting the persisted run, receipts, outputs, memory proposals, and handoff. Recovery must not replay side effects blindly. A recovered run re-checks policy, intent-lock stop conditions, and iteration limits before issuing another step request.

Handoffs at terminal

Run handoffs summarize the terminal outcome through the existing craik.handoff contract. A run handoff should include:

Run status

completed / blocked / failed / interrupted with the last phase reached.

Captured outputs

Runner metadata and the step-result ids that produced durable artifacts.

Receipt ids

Provider, side-effect, memory, and policy receipts emitted during the run.

Diagnostics & risks

Residual risks, recovery guidance, and any context debt left for the next agent — no claiming work that did not complete.

What's next

ReadCase filesThe per-task pre-run brief the loop reads from before plan.ReferenceRunner step contractsThe typed shape every step request and step result carries.ReferenceRecovery modeThe continuity view a resuming agent reads before acting.

Lifecycle​

Safety boundaries​

Intent lock

Capability grant

Iteration ceiling

Outputs & memory​

Fixture vs live runners​

Fixture execution

Live runner execution

Recovery​

Handoffs at terminal​

Run status

Captured outputs

Receipt ids

Diagnostics & risks

What's next​

Lifecycle

Safety boundaries

Outputs & memory

Fixture vs live runners

Recovery

Handoffs at terminal

What's next