Skip to main content
Version: MVP

Learn Craik

Craik is a governed agent-runtime substrate — the operating layer that turns coding agents from isolated chat sessions into accountable project workers. This section explains what that means, the typed objects the runtime is built from, and how the project intends to grow.

If you'd rather jump straight to installing the CLI, head to Build → Getting started.

What's in this section

The product

A durable agent runtime — not another framework.

Start here to understand the thesis: agent work needs an operating layer — not more clever prompting — and the five docs below build that argument from the north star down through the typed contracts. Every other section of these docs is downstream of this one.

Featured · 01

Vision

Craik's central claim is that agents need an operating layer that gives them a shared model of the work, evidence-backed memory, explicit authority boundaries, structured handoffs, durable artifacts, and a way to resolve disagreement. Read this first — every other doc is downstream of it.

  • durable agent runtime
  • north star
  • design principles
  • initial wedge
Read the vision

North star

A new agent should be able to join a project and understand its current state better than a human who has been away for two weeks.

— Vision · §North Star

  1. 02 · Positioning

    Product strategy

    Why Craik is a durable agent runtime, not a framework. The market wedge, the agent-runner strategy (Codex / Claude / Gemini as first-class adapters), the MIT license rationale, and the patterns Craik borrows from local runtimes versus the patterns it adds on top.

    Craik should not be positioned as another agent framework. Product strategy · §Agent Runner Strategy

    • runner strategy
    • license
    • gateway ergonomics
    • multi-agent coordination

    For: founders · product5 min read

  2. 03 · What's distinct

    Differentiators

    The features that keep the roadmap from collapsing into basic CLI plumbing. Evidence-first execution, the assumption ledger, the belief-promotion lifecycle, context budgeting as policy, and end-to-end run reproducibility.

    No durable assertion without evidence. Differentiators · §Evidence-First Execution

    • evidence-first
    • assumption ledger
    • belief promotion
    • reproducibility

    For: engineers · reviewers10 min read

  3. 04 · What ships

    Features

    The implementable feature surface — every MVP behavior with acceptance criteria. Project registry, case-file assembler, policy envelope, capability grants, runner adapters, work graph, receipts, handoffs. Read this to know exactly what v0.1 ships.

    Read-only tasks default to repo read, memory read, and receipt write. Implementation tasks require explicit write grants. Features · §Policy Envelope

    • case files
    • policy envelope
    • receipts
    • handoffs
    • work graph

    For: implementers10 min read

  4. 05 · How it composes

    Architecture

    The seven runtime layers — gateway, project model, orchestration, runner adapters, capability, memory, work graph, experience — plus the typed contracts that hold them together. The map for anyone extending Craik or integrating a new runner.

    The layers should remain separable so Craik can support different model providers, tool environments, and memory backends without weakening the product thesis. Architecture · §Layers

    • seven layers
    • runtime flow
    • core contracts
    • borrowed patterns

    For: architects · contributors4 min read

Core concepts

Nine typed objects every other doc speaks.

Each concept below maps to a runtime object the rest of the docs reference by name. Read in order on a first pass — Build, Operate, and Secure all assume you know what these are. Project model is foundational; everything else composes on top.

Foundation · 01

Project model

The runner-readable view Craik builds from a registered repository. Combines local configuration, repository state, documentation boundaries, memory backend posture, policy posture, and known continuity records into a single typed object every Craik component speaks. Case files, intent locks, and onboarding payloads are all drawn against it.

  • mutable docs vs immutable evidence
  • policy posture
  • continuity
  • onboarding payload
Read the project model

Operational by design

The model tells an agent which repository it is entering, which docs are mutable, which paths are immutable evidence, which memory backend is configured, and which next actions are currently allowed.

— Project model · §Overview

  1. 02 · Pre-run brief

    Case files

    The per-task pre-run brief. Evidence, assumptions, stale-risk markers, context-budget metadata, and a verification plan — sealed when built, addressable for audit, and the input every runner reads first.

    A case file is not a memory store, and it is not a transcript. Case files · §Definition

    • evidence
    • assumptions
    • context budget
    • verification plan

    For: runners · reviewers6 min read

  2. 03 · Bounded iteration

    Single-agent execution loop

    Plan → Act → Observe → Evaluate → Continue or Stop. The v0.1 loop lets a runner work through a governed task without depending on an untracked chat transcript. Craik owns the durable boundary: run state, policy checks, receipts, step outputs, and recovery context.

    Side effects are policy-gated. A step such as shell execution must have a matching capability grant before it runs. Single-agent loop · §Safety Boundaries

    • plan / act / observe / evaluate
    • step results
    • recovery
    • intent-lock checks

    For: implementers4 min read

  3. 04 · Durable accountability

    Receipts

    A concise, durable record for every action that mattered. Each receipt names actor, credential, target, capability, reason, and result — joinable by task, policy envelope, and handoff. Redaction guard runs on every persistence path.

    Every receipt names who acted, what they used, what they touched, why it happened, and how it ended. Receipts · §Definition

    • actor + credential
    • redaction
    • task linkage
    • audit trail

    For: auditors · operators5 min read

  4. 05 · Continuity

    Handoffs

    Machine-readable run summaries the next agent — human or model — picks up from. Status, completed actions, validation, assumptions, context debt, policy exceptions, receipts, and memory proposals — plus a self-audit checklist that keeps incomplete runs honest.

    A handoff is not a transcript and not a chat log. It's the concise continuity record that lets the next actor pick up. Handoffs · §Definition

    • structured + markdown
    • self-audit
    • policy exceptions
    • next-step contract

    For: runners · humans5 min read

  5. 06 · Connected state

    Work graph

    A projection over the runtime objects already in $CRAIK_HOME/state/. Tasks, case files, handoffs, receipts, memory proposals, evidence, assumptions, and contradictions become queryable nodes connected by typed edges. Deterministic, redacted, exportable.

    The graph isn't a separate data store — it's a projection over the existing typed objects in $CRAIK_HOME/state/. Work graph · §Definition

    • nodes & edges
    • graph export
    • operator views
    • cross-cutting queries

    For: reviewers5 min read

  6. 07 · Governed truth

    Memory & Stigmem

    Memory is governed project state, not a transcript cache. Agent-created updates default to proposals with evidence; direct writes need the memory.write grant. Craik owns orchestration; Stigmem owns the durable fact substrate.

    Agent-created memory updates default to proposals — durable, evidence- backed candidate facts that remain reviewable until a human (or a policy grant) promotes them. Memory & Stigmem · §Proposal-First

    • proposal-first
    • evidence + scope
    • direct-write grant
    • Stigmem ownership

    For: memory operators6 min read

  7. 08 · Runtime guardrails

    Governance

    Policy envelopes, capability grants, immutable paths, redaction, receipt obligations, memory defaults, and the policy gate — all typed runtime objects, not advisory configuration. Strict by default; fail-open is opt-in only.

    Craik treats governance as a runtime concern. Policy envelopes, capability grants, and immutable paths are first-class records. Governance · §Definition

    • policy profiles
    • capability grants
    • fail-open
    • redaction

    For: policy operators6 min read

  8. 09 · Accepted scope

    Intent locks

    The runtime's accepted interpretation of a task — explicit, durable, and separate from the original request. In-scope, out-of-scope, allowed autonomy, stop conditions, and scope-change rules. Every case file and handoff carries the lock id.

    The lock is what the runtime committed to before the work began — every later decision can be checked against it. Intent locks · §Why bother?

    • accepted interpretation
    • in-scope / out-of-scope
    • stop conditions
    • scope-change rules

    For: task owners5 min read

Runtime contracts

The typed spine every component speaks.

Six contracts the runtime persists, versions, and validates. Adapters, memory backends, and future plugins integrate through these — break one and the policy gate fails closed. Read this when you need to write a policy, ship an adapter, or interpret a receipt.

Foundation · 01

Runtime contracts overview

The product spine. Every persisted contract carries schema and version fields; breaking changes require a new version and a migration path. Task requests, case files, policy envelopes, capability grants, capability receipts, handoffs, proposed facts, contradiction reports, verification results, and work-graph events — all live here.

  • versioning
  • shape examples
  • migration policy
  • adapter integration
Read the contracts

Why a spine

Craik should be built around stable, versioned contracts. The contracts are the product spine: adapters, agents, memory backends, and future plugins should integrate through them.

— Runtime contracts · §Intro

  1. 02 · Strict typing

    Schemas

    Every contract is a strict Pydantic model. craik schema list enumerates them; craik schema show <name> prints JSON Schema. Unknown fields are rejected so adapters and plugins can't silently depend on accidental payload shape.

    Unknown fields are rejected so adapters, memory backends, and future plugins do not silently depend on accidental payload shape. Schemas · §Intro

    • pydantic models
    • schema CLI
    • JSON Schema export
    • strict validation

    For: integratorsReference

  2. 03 · Repo wiring

    Project profile

    The craik.project_profile shape: stable id, repo paths, default branch, docs and immutable paths, memory backend and scope. Inputs to every case-file build and onboarding payload.

    Project profiles describe repositories Craik can reason about. Project profile · §Intro

    • repo metadata
    • docs boundaries
    • memory backend
    • git detection

    For: operatorsReference

  3. 04 · Inspectable runs

    Run state

    craik.task_run links task request, case file, policy envelope, runner identity, intent lock, receipts, and final handoff. Status (pending → running → completed/blocked/failed/interrupted) and phase (plan/act/observe/evaluate/continue/stop) are both first-class fields.

    It gives later loop orchestration an inspectable record without depending on an untracked chat transcript. Run state · §Intro

    • task_run
    • status + phase
    • recovery
    • step results

    For: implementersReference

  4. 05 · Typed specialist output

    Worker results

    craik.worker_result preserves role-specific specialist output: findings with severity and evidence, artifacts, assumptions, risks, proposed actions, contradiction ids, receipts, diagnostics. Conflicting specialist outputs stay conflicting — review decides later.

    Specialist outputs should remain typed even when agents disagree. Do not flatten conflicting results into a single consensus. Worker results · §Typed outputs

    • typed findings
    • severity + evidence
    • contradiction preservation
    • multi-agent

    For: orchestrationReference

  5. 06 · MVP hardening

    Failure modes

    The fail-closed posture. Prompt-injection containment, secret rejection at persistence, denied-capability handling, fail-open visibility, automation stops, recovery requirements — and an explicit list of paths the MVP does not claim (live provider calls as default, broad daemon mode, dashboards, direct durable memory writes).

    The runtime should preserve enough state to recover or review a failed run without silently promoting uncertain work to durable facts. Failure modes · §Intro

    • fail-closed
    • prompt injection
    • secret rejection
    • MVP boundaries

    For: security · reviewersReference

Status & roadmap

Where Craik is today — and where it is going.

Honest about what's not yet built. Six docs together describe the active MVP boundary, the upcoming releases, the current end-to-end surfaces, and the gates an item must clear before it ships.

Active · 01

MVP roadmap

The robust 0.x.0 MVP target — not 1.0.0. Names the readiness work that affects trust, release hygiene, documentation accuracy, provider support, and package publication. Read this when you want to know what blocks the first public release.

  • OIDC operator identity
  • credential profiles
  • OpenAI + Anthropic support
  • release gates
Read the MVP roadmap

Definition of done

The MVP is complete when Craik can run one real software-delivery workflow end to end with OIDC-authenticated operators, typed credential profiles, policy-enforced side effects, durable receipts, a useful handoff, accurate documentation, and package-release quality gates.

— MVP roadmap · §MVP Definition

  1. 02 · One workflow

    MVP plan

    The original MVP scope: prove one complete workflow instead of a broad platform shell. The accepted primary demo is Stigmem documentation and state reconciliation — the workflow CI exercises end-to-end.

    The MVP should prove one complete workflow instead of building a broad platform shell. MVP plan · §MVP Goal

    • stigmem demo
    • governed workflow
    • handoff backed by memory
    • capability receipts

    For: contributorsRead MVP

  2. 03 · Long view

    Roadmap

    The broader trajectory: smallest useful runtime first, then Stigmem- native memory, runner adapters, multi-agent coordination, instruction distillation, community extensions. Seven roadmap rules keep features from shipping without docs, evidence, and policy posture.

    Every roadmap item must produce implementation, tests or validation, and documentation. Craik should not ship features that only exist as code or only exist as strategy. Roadmap · §Roadmap Rules

    • seven rules
    • CLI first
    • evidence before memory
    • strict-by-default

    For: anyoneRead roadmap

  3. 04 · Pass / fail snapshot

    Release readiness · v0.1.0

    The concrete checklist validated on 2026-05-17 against main. CI green, CodeQL green, schema and contract regressions verified. Repository-owned readiness is complete; remaining work is the protected publication process at tag time.

    Repository-owned readiness checks are complete. The remaining work is outside the repository: create the v0.1.0 tag and run the protected publication process when the maintainer is ready. Release readiness · §Summary

    • CI green
    • CodeQL green
    • schema regressions
    • publication gate

    For: maintainersSnapshot

  4. 05 · What's not yet

    Limitations

    The honest scope boundary. Lists the v0.1 end-to-end surfaces that work today (home init, project registration, case-file assembly, local state inspection, policy gates, foreground gateway health service) and the deliberately post-MVP surfaces (hosted gateway dispatch, operator dashboards, broad live tool execution).

    Several surfaces are not yet end-to-end production workflows. Limitations · §Intro

    • working today
    • post-MVP scope
    • v0.12 contract coverage
    • honesty boundary

    For: everyoneHonest scope

  5. 06 · How it gets built

    Implementation plan

    The accepted stack and build sequence. Python 3.12+, Typer CLI, Pydantic schemas, SQLite for local state, stdlib HTTP for first integrations, pytest for tests, ruff and mypy for quality. The sequence of milestones that gets v0.1 to release.

    This plan turns the Craik concept into a buildable sequence. Implementation plan · §Intro

    • python 3.12+
    • typer + pydantic
    • milestones
    • quality gates

    For: contributorsBuild plan

Architecture decisions

The reasons behind the structural choices.

ADRs record durable design decisions separately from mutable reference material. Reference docs describe current behavior; ADRs explain why the shape exists, what tradeoffs were accepted, and how a decision can be retracted.

  1. ADR 0001Accepted

    MVP runner scope

    Sets the public framing: the MVP ships case-file assembly, policy envelopes, prompt compilation, receipts, handoffs, and one governed workflow — not unbounded tool execution.

    Read decision
  2. ADR 0002Accepted

    Provider transport & mode families

    OpenAI Responses, Anthropic Messages, and OAI-compatible Chat Completions stay as separate transport families — not collapsed into a single adapter — so tool, streaming, usage, and retry differences stay explicit.

    Read decision
  3. ADR 0003Accepted

    Secret handling

    Receipts, handoffs, case files, provider configs, and local store records are scrubbed through a central redaction guard before persistence. Secret material is referenced — never copied.

    Read decision
  4. ADR 0004Accepted

    Policy envelope shape

    The policy envelope binds actor, task, profile, grant requirements, redaction posture, and receipt obligations into one typed record that travels with every governed action.

    Read decision
  5. ADR 0005Accepted

    Receipts & handoffs as public contracts

    Receipts and handoffs sit at the boundary of runtime, memory, docs, and operator workflows. They are versioned public contracts — adapters and plugins integrate against them without renegotiating shape.

    Read decision
  6. ADR 0006Accepted

    Package & runtime layout

    Splits the historically flat runtime namespace into ownership-bearing modules (providers, memory, policy, work execution, companions, channels, voice, sandboxing, project workflows) so change rates and risk profiles can diverge cleanly.

    Read decision
  7. ADR 0007Accepted

    Credential & identity architecture

    Provider credentials and operator identity are governance inputs. Every receipt names which human authorized work, which credential carried the call, which policy allowed it, and which grant made the credential usable.

    Read decision
  8. IndexCatalog

    ADR index

    The full catalog of accepted decisions and the conventions for proposing new ones, retiring old ones, and citing them from reference docs.

    Browse all

Stigmem integration

The reference memory substrate — and the boundary.

Craik runs in degraded local mode without Stigmem, but Stigmem is the reference substrate for team-scale memory. One doc draws the exact boundary between what Craik owns and what Stigmem owns.

Where to go next

Once the concepts are clear, choose your path: