Learn Craik
Craik is a governed agent-runtime substrate — the operating layer that turns coding agents from isolated chat sessions into accountable project workers. This section explains what that means, the typed objects the runtime is built from, and how the project intends to grow.
If you'd rather jump straight to installing the CLI, head to Build → Getting started.
What's in this section
Featured · 01
Vision
Craik's central claim is that agents need an operating layer that gives them a shared model of the work, evidence-backed memory, explicit authority boundaries, structured handoffs, durable artifacts, and a way to resolve disagreement. Read this first — every other doc is downstream of it.
- durable agent runtime
- north star
- design principles
- initial wedge
North star
A new agent should be able to join a project and understand its current state better than a human who has been away for two weeks.
— Vision · §North Star
02 · Positioning
Product strategy
Why Craik is a durable agent runtime, not a framework. The market wedge, the agent-runner strategy (Codex / Claude / Gemini as first-class adapters), the MIT license rationale, and the patterns Craik borrows from local runtimes versus the patterns it adds on top.
Craik should not be positioned as another agent framework. Product strategy · §Agent Runner Strategy
- runner strategy
- license
- gateway ergonomics
- multi-agent coordination
03 · What's distinct
Differentiators
The features that keep the roadmap from collapsing into basic CLI plumbing. Evidence-first execution, the assumption ledger, the belief-promotion lifecycle, context budgeting as policy, and end-to-end run reproducibility.
No durable assertion without evidence. Differentiators · §Evidence-First Execution
- evidence-first
- assumption ledger
- belief promotion
- reproducibility
04 · What ships
Features
The implementable feature surface — every MVP behavior with acceptance criteria. Project registry, case-file assembler, policy envelope, capability grants, runner adapters, work graph, receipts, handoffs. Read this to know exactly what v0.1 ships.
Read-only tasks default to repo read, memory read, and receipt write. Implementation tasks require explicit write grants. Features · §Policy Envelope
- case files
- policy envelope
- receipts
- handoffs
- work graph
05 · How it composes
Architecture
The seven runtime layers — gateway, project model, orchestration, runner adapters, capability, memory, work graph, experience — plus the typed contracts that hold them together. The map for anyone extending Craik or integrating a new runner.
The layers should remain separable so Craik can support different model providers, tool environments, and memory backends without weakening the product thesis. Architecture · §Layers
- seven layers
- runtime flow
- core contracts
- borrowed patterns
Foundation · 01
Project model
The runner-readable view Craik builds from a registered repository. Combines local configuration, repository state, documentation boundaries, memory backend posture, policy posture, and known continuity records into a single typed object every Craik component speaks. Case files, intent locks, and onboarding payloads are all drawn against it.
- mutable docs vs immutable evidence
- policy posture
- continuity
- onboarding payload
Operational by design
The model tells an agent which repository it is entering, which docs are mutable, which paths are immutable evidence, which memory backend is configured, and which next actions are currently allowed.
— Project model · §Overview
02 · Pre-run brief
Case files
The per-task pre-run brief. Evidence, assumptions, stale-risk markers, context-budget metadata, and a verification plan — sealed when built, addressable for audit, and the input every runner reads first.
A case file is not a memory store, and it is not a transcript. Case files · §Definition
- evidence
- assumptions
- context budget
- verification plan
03 · Bounded iteration
Single-agent execution loop
Plan → Act → Observe → Evaluate → Continue or Stop. The v0.1 loop lets a runner work through a governed task without depending on an untracked chat transcript. Craik owns the durable boundary: run state, policy checks, receipts, step outputs, and recovery context.
Side effects are policy-gated. A step such as shell execution must have a matching capability grant before it runs. Single-agent loop · §Safety Boundaries
- plan / act / observe / evaluate
- step results
- recovery
- intent-lock checks
04 · Durable accountability
Receipts
A concise, durable record for every action that mattered. Each receipt names actor, credential, target, capability, reason, and result — joinable by task, policy envelope, and handoff. Redaction guard runs on every persistence path.
Every receipt names who acted, what they used, what they touched, why it happened, and how it ended. Receipts · §Definition
- actor + credential
- redaction
- task linkage
- audit trail
05 · Continuity
Handoffs
Machine-readable run summaries the next agent — human or model — picks up from. Status, completed actions, validation, assumptions, context debt, policy exceptions, receipts, and memory proposals — plus a self-audit checklist that keeps incomplete runs honest.
A handoff is not a transcript and not a chat log. It's the concise continuity record that lets the next actor pick up. Handoffs · §Definition
- structured + markdown
- self-audit
- policy exceptions
- next-step contract
06 · Connected state
Work graph
A projection over the runtime objects already in
$CRAIK_HOME/state/. Tasks, case files, handoffs, receipts, memory proposals, evidence, assumptions, and contradictions become queryable nodes connected by typed edges. Deterministic, redacted, exportable.The graph isn't a separate data store — it's a projection over the existing typed objects in
$CRAIK_HOME/state/. Work graph · §Definition- nodes & edges
- graph export
- operator views
- cross-cutting queries
07 · Governed truth
Memory & Stigmem
Memory is governed project state, not a transcript cache. Agent-created updates default to proposals with evidence; direct writes need the
memory.writegrant. Craik owns orchestration; Stigmem owns the durable fact substrate.Agent-created memory updates default to proposals — durable, evidence- backed candidate facts that remain reviewable until a human (or a policy grant) promotes them. Memory & Stigmem · §Proposal-First
- proposal-first
- evidence + scope
- direct-write grant
- Stigmem ownership
08 · Runtime guardrails
Governance
Policy envelopes, capability grants, immutable paths, redaction, receipt obligations, memory defaults, and the policy gate — all typed runtime objects, not advisory configuration. Strict by default; fail-open is opt-in only.
Craik treats governance as a runtime concern. Policy envelopes, capability grants, and immutable paths are first-class records. Governance · §Definition
- policy profiles
- capability grants
- fail-open
- redaction
09 · Accepted scope
Intent locks
The runtime's accepted interpretation of a task — explicit, durable, and separate from the original request. In-scope, out-of-scope, allowed autonomy, stop conditions, and scope-change rules. Every case file and handoff carries the lock id.
The lock is what the runtime committed to before the work began — every later decision can be checked against it. Intent locks · §Why bother?
- accepted interpretation
- in-scope / out-of-scope
- stop conditions
- scope-change rules
Foundation · 01
Runtime contracts overview
The product spine. Every persisted contract carries
schema and version fields; breaking changes
require a new version and a migration path. Task requests, case files,
policy envelopes, capability grants, capability receipts, handoffs,
proposed facts, contradiction reports, verification results, and
work-graph events — all live here.
- versioning
- shape examples
- migration policy
- adapter integration
Why a spine
Craik should be built around stable, versioned contracts. The contracts are the product spine: adapters, agents, memory backends, and future plugins should integrate through them.
— Runtime contracts · §Intro
02 · Strict typing
Schemas
Every contract is a strict Pydantic model.
craik schema listenumerates them;craik schema show <name>prints JSON Schema. Unknown fields are rejected so adapters and plugins can't silently depend on accidental payload shape.Unknown fields are rejected so adapters, memory backends, and future plugins do not silently depend on accidental payload shape. Schemas · §Intro
- pydantic models
- schema CLI
- JSON Schema export
- strict validation
03 · Repo wiring
Project profile
The
craik.project_profileshape: stable id, repo paths, default branch, docs and immutable paths, memory backend and scope. Inputs to every case-file build and onboarding payload.Project profiles describe repositories Craik can reason about. Project profile · §Intro
- repo metadata
- docs boundaries
- memory backend
- git detection
04 · Inspectable runs
Run state
craik.task_runlinks task request, case file, policy envelope, runner identity, intent lock, receipts, and final handoff. Status (pending → running → completed/blocked/failed/interrupted) and phase (plan/act/observe/evaluate/continue/stop) are both first-class fields.It gives later loop orchestration an inspectable record without depending on an untracked chat transcript. Run state · §Intro
- task_run
- status + phase
- recovery
- step results
05 · Typed specialist output
Worker results
craik.worker_resultpreserves role-specific specialist output: findings with severity and evidence, artifacts, assumptions, risks, proposed actions, contradiction ids, receipts, diagnostics. Conflicting specialist outputs stay conflicting — review decides later.Specialist outputs should remain typed even when agents disagree. Do not flatten conflicting results into a single consensus. Worker results · §Typed outputs
- typed findings
- severity + evidence
- contradiction preservation
- multi-agent
06 · MVP hardening
Failure modes
The fail-closed posture. Prompt-injection containment, secret rejection at persistence, denied-capability handling, fail-open visibility, automation stops, recovery requirements — and an explicit list of paths the MVP does not claim (live provider calls as default, broad daemon mode, dashboards, direct durable memory writes).
The runtime should preserve enough state to recover or review a failed run without silently promoting uncertain work to durable facts. Failure modes · §Intro
- fail-closed
- prompt injection
- secret rejection
- MVP boundaries
Active · 01
MVP roadmap
The robust 0.x.0 MVP target — not 1.0.0. Names
the readiness work that affects trust, release hygiene, documentation
accuracy, provider support, and package publication. Read this when you
want to know what blocks the first public release.
- OIDC operator identity
- credential profiles
- OpenAI + Anthropic support
- release gates
Definition of done
The MVP is complete when Craik can run one real software-delivery workflow end to end with OIDC-authenticated operators, typed credential profiles, policy-enforced side effects, durable receipts, a useful handoff, accurate documentation, and package-release quality gates.
— MVP roadmap · §MVP Definition
02 · One workflow
MVP plan
The original MVP scope: prove one complete workflow instead of a broad platform shell. The accepted primary demo is Stigmem documentation and state reconciliation — the workflow CI exercises end-to-end.
The MVP should prove one complete workflow instead of building a broad platform shell. MVP plan · §MVP Goal
- stigmem demo
- governed workflow
- handoff backed by memory
- capability receipts
03 · Long view
Roadmap
The broader trajectory: smallest useful runtime first, then Stigmem- native memory, runner adapters, multi-agent coordination, instruction distillation, community extensions. Seven roadmap rules keep features from shipping without docs, evidence, and policy posture.
Every roadmap item must produce implementation, tests or validation, and documentation. Craik should not ship features that only exist as code or only exist as strategy. Roadmap · §Roadmap Rules
- seven rules
- CLI first
- evidence before memory
- strict-by-default
04 · Pass / fail snapshot
Release readiness · v0.1.0
The concrete checklist validated on 2026-05-17 against
main. CI green, CodeQL green, schema and contract regressions verified. Repository-owned readiness is complete; remaining work is the protected publication process at tag time.Repository-owned readiness checks are complete. The remaining work is outside the repository: create the
v0.1.0tag and run the protected publication process when the maintainer is ready. Release readiness · §Summary- CI green
- CodeQL green
- schema regressions
- publication gate
05 · What's not yet
Limitations
The honest scope boundary. Lists the v0.1 end-to-end surfaces that work today (home init, project registration, case-file assembly, local state inspection, policy gates, foreground gateway health service) and the deliberately post-MVP surfaces (hosted gateway dispatch, operator dashboards, broad live tool execution).
Several surfaces are not yet end-to-end production workflows. Limitations · §Intro
- working today
- post-MVP scope
- v0.12 contract coverage
- honesty boundary
06 · How it gets built
Implementation plan
The accepted stack and build sequence. Python 3.12+, Typer CLI, Pydantic schemas, SQLite for local state, stdlib HTTP for first integrations,
pytestfor tests, ruff and mypy for quality. The sequence of milestones that gets v0.1 to release.This plan turns the Craik concept into a buildable sequence. Implementation plan · §Intro
- python 3.12+
- typer + pydantic
- milestones
- quality gates
- ADR 0001Accepted
MVP runner scope
Sets the public framing: the MVP ships case-file assembly, policy envelopes, prompt compilation, receipts, handoffs, and one governed workflow — not unbounded tool execution.
Read decision - ADR 0002Accepted
Provider transport & mode families
OpenAI Responses, Anthropic Messages, and OAI-compatible Chat Completions stay as separate transport families — not collapsed into a single adapter — so tool, streaming, usage, and retry differences stay explicit.
Read decision - ADR 0003Accepted
Secret handling
Receipts, handoffs, case files, provider configs, and local store records are scrubbed through a central redaction guard before persistence. Secret material is referenced — never copied.
Read decision - ADR 0004Accepted
Policy envelope shape
The policy envelope binds actor, task, profile, grant requirements, redaction posture, and receipt obligations into one typed record that travels with every governed action.
Read decision - ADR 0005Accepted
Receipts & handoffs as public contracts
Receipts and handoffs sit at the boundary of runtime, memory, docs, and operator workflows. They are versioned public contracts — adapters and plugins integrate against them without renegotiating shape.
Read decision - ADR 0006Accepted
Package & runtime layout
Splits the historically flat runtime namespace into ownership-bearing modules (providers, memory, policy, work execution, companions, channels, voice, sandboxing, project workflows) so change rates and risk profiles can diverge cleanly.
Read decision - ADR 0007Accepted
Credential & identity architecture
Provider credentials and operator identity are governance inputs. Every receipt names which human authorized work, which credential carried the call, which policy allowed it, and which grant made the credential usable.
Read decision - IndexCatalog
ADR index
The full catalog of accepted decisions and the conventions for proposing new ones, retiring old ones, and citing them from reference docs.
Browse all
Where to go next
Once the concepts are clear, choose your path:
- Install and run something → Build · Getting started
- Integrate a runner or provider → Build · Connecting runners
- Govern execution → Secure · Governance fundamentals
- Run and inspect the system → Operate · Operator views