Skip to main content
Version: MVP

Single-Agent Execution Loop

5 min readCore conceptUpdated 2026-05-19

What you'll learn

  • The six bounded phases that make up the v0.1 single-agent loop.
  • The safety boundaries Craik checks before each step.
  • How runner output becomes typed run output and (sometimes) memory proposals.
  • The recovery contract a resuming agent must honor.

Single-agent execution loop

The v0.1 contract that lets one runner work through a governed task without depending on an untracked chat transcript. The runner does the reasoning; Craik owns the durable boundary — run state, policy checks, receipts, step outputs, memory proposals, and recovery context.

Lifecycle

A run starts as a craik.task_run with status pending and phase plan. The executor moves through bounded steps:

Phase
Status it produces
Purpose
plan
→ running
Decide the next bounded action under the intent lock.
act
→ running
Perform an approved side effect or propose one.
observe
→ running
Capture output, diagnostics, receipts, and artifacts.
evaluate
→ running
Decide whether to stop, continue, block, or fail.
continue
→ running
Advance to another bounded iteration when needed.
stop
→ terminal
Finalize state, handoff, receipts, and recovery context.

Terminal statuses are completed, blocked, failed, and interrupted. Interrupted runs preserve enough local state for inspection and later recovery.

Safety boundaries

Before each step, Craik enforces three checks. None of them are advisory — violating any of them halts the loop.

Intent lock

If a step would trigger a configured stop condition, the run halts before the runner receives another request.

Capability grant

Side effects (shell, file write, memory write) need a matching grant. Denied side effects produce denial receipts and block the run.

Iteration ceiling

max_iterations bounds the loop. Reaching the bound interrupts the run instead of continuing indefinitely.

Outputs & memory

Runner output is captured as a craik.runner_step_result, then persisted as a redacted craik.run_output. Run outputs can create reviewable craik.memory_proposal records — they cannot write durable facts directly.

Run-created proposals link back to the run for audit:

Field
Type
Purpose
task_id
id
The task this proposal came out of.
run_id
id
The specific run inside the task.
step_result_id
id
The runner step that produced the observation.
handoff_id
id (optional)
The handoff that closed the run, when one exists.
evidence
evidence_reference[]
Pointers back to the captured run output and step result.

Blocked and failed steps are still inspectable, but they do not create memory proposals. Only completed and partial step results may propose, and only when the executor supplies explicit proposal specs.

Fixture vs live runners

Fixture execution

  • Uses FixtureStepRunner and deterministic step statuses.
  • For local tests, docs, and executor contract checks.
  • No credentials, no external side effects, byte-stable.
  • What CI exercises on every PR.

Live runner execution

  • Uses adapter-specific boundaries (Codex / Claude / Gemini today).
  • Same loop contract applies — only the runner backend differs.
  • Provider-specific details stay under structured, redacted metadata.
  • May require stricter grants depending on capability surface.

Recovery

Recovery starts by inspecting the persisted run, receipts, outputs, memory proposals, and handoff. Recovery must not replay side effects blindly. A recovered run re-checks policy, intent-lock stop conditions, and iteration limits before issuing another step request.

Handoffs at terminal

Run handoffs summarize the terminal outcome through the existing craik.handoff contract. A run handoff should include:

Run status

completed / blocked / failed / interrupted with the last phase reached.

Captured outputs

Runner metadata and the step-result ids that produced durable artifacts.

Receipt ids

Provider, side-effect, memory, and policy receipts emitted during the run.

Diagnostics & risks

Residual risks, recovery guidance, and any context debt left for the next agent — no claiming work that did not complete.

What's next