Skip to main content
Version: MVP

Roadmap

20 min readFor maintainers & integratorsUpdated 2026-05-22

What you'll find here

The full Craik roadmap: rules every roadmap item must satisfy, the documentation model, the release-gate sequence from v0.1.0 through v0.12.0 (plus a post-MVP stability gate), and the 24 executable workstreams that turn those gates into shippable PRs.

Executable by design.

Every roadmap item must produce implementation, tests or validation, and documentation. Craik should not ship features that only exist as code or only exist as strategy. The smallest useful runtime ships first; everything else builds on top.

Roadmap rules

  1. Every feature has docs. User-facing behavior requires guide docs · runtime contracts require reference docs · policy behavior requires security/governance docs · adapter behavior requires integration docs.
  2. Docs ship with implementation. A feature is not done until its docs, examples, and validation guidance are merged.
  3. Strict by default. New capabilities must respect policy envelopes, grants, redaction, receipts, and memory-write defaults.
  4. Evidence before memory. No durable assertion or Stigmem write without evidence, provenance, and scope.
  5. Source remains canonical. Derived memory, distilled instructions, generated docs, and summaries must cite source artifacts.
  6. CLI first, UI later. CLI workflows prove the runtime before dashboard work broadens the surface.
  7. Stigmem is the reference substrate. Local mode exists for onboarding and tests, but full durable behavior assumes Stigmem.

Documentation model

Craik docs borrow the most useful patterns from three product lineages.

From Stigmem

Explicit concept docs · protocol & contract reference · generated API/CLI reference once implementation exists · roadmap & limitations · security & governance · durable examples tied to real workflows · clear public/internal boundaries.

From local agent runtimes

Practical setup · workspace/project mental model · tool/skill/plugin docs · channel/adapter docs · operator-friendly examples.

From multi-agent orchestration tools

CLI-first user guides · configuration docs · skills docs · security/approval docs · multi-agent workflow docs · exact dependency & supply-chain notes where relevant.

Target docs tree:

docs/
concepts/
durable-agent-runtime.md
project-models.md
case-files.md
handoffs.md
receipts.md
work-graph.md
memory-and-stigmem.md
governance.md
instruction-distillation.md
skills-and-plugins.md
guides/
installation.md
quickstart.md
first-stigmem-reconciliation-demo.md
configuring-craik-home.md
connecting-stigmem.md
using-case-files.md
writing-handoffs.md
running-policy-tests.md
runner-adapters.md
community-skills.md
community-plugins.md
reference/
cli.md
config.md
schemas.md
policy-profiles.md
memory-backends.md
runner-adapter-contract.md
plugin-contract.md
security/
index.md
redaction.md
secrets.md
capability-grants.md
fail-open-profiles.md
roadmap.md
limitations.md

Root-level project governance files remain authoritative for contribution, security disclosure, trademarks, and maintainership.

MVP release strategy

0.x.0 MVP first, 1.0.0 later.

1.0.0 is a later stability signal after real-world usage, compatibility confidence, and security soak. The MVP still pulls forward readiness work that affects trust: migrations, release hygiene, package publication, generated docs, security process, provider certification, public/internal boundaries, provenance tracking, memory hygiene, and CI/CD depth.

The checkable MVP plan lives in Robust MVP Roadmap. The release gates below describe the contract and feature build-up through v0.12; the MVP roadmap turns those surfaces into release-quality workflows.

Release gates

Craik stays on 0.x.0 releases until the maintainers are confident the product is stable enough to call 1.0. The gates below are sequencing targets, not promises that 1.0.0 follows immediately after 0.7.0. Additional 0.x.0 releases get added whenever the product needs more soak time, compatibility work, security hardening, or real-user validation.

v0.1.0 · Governed agent-runtime substrate

Required outcome. A user can register a real repo, authenticate via OIDC, assemble a governed case file, compile runner prompts, execute provider requests against OpenAI Responses / Anthropic Messages / OpenAI-compatible Chat Completions adapters (fixture-backed by default, live opt-in), resolve provider credentials through typed profiles or workload-federated brokering, record receipts that name both the operator identity and the credential identity, produce durable handoffs, propose memory updates, and export the work graph. Policy can constrain which operators and which credentials a task may use; credential authorization is itself a receipted graph.

Required capabilities:

Python 3.12+ package

And craik CLI.

MIT license

And governance files.

Local home

~/.craik default · CRAIK_HOME override.

Pydantic contracts

SQLite local state

Project registry

Policy profiles

Strict · trusted-local · automation.

Capability grants

Central redaction utility

Receipt store

Handoff writer

Memory backends

Local backend + Stigmem read client.

Case-file assembler

Evidence · assumptions · context budget · default exclusions with project/user overrides.

Intent locks

And self-audit before handoff.

Memory proposals

With diff and impact preview.

Work graph export

Read-only GitHub adapter

Stigmem demo

Docs reconciliation demo + behavior-test acceptance.

Provider transport

FixtureTransport · HTTPTransport over stdlib urllib with SSE.

Provider families

OpenAI Responses · Anthropic Messages · OpenAI-compatible Chat Completions.

Provider features

Tool-call round-trip · streaming chunk capture · retry · timeout · cancellation.

Single-agent loop

Task run state machine · plan/act/observe/evaluate phases · runner step contract · per-step receipts and policy gates · output capture · memory proposal creation · handoff on completion/block/failure.

Typed credentials

auth-profiles.json with <provider_family>:<name> IDs.

Credential sources

Env-var API key · local-CLI OAuth fallback · vendor-CLI subprocess bridge · external secret manager reference · marker · Stigmem-backed reference.

Credential pool

Rotation · failover · per-profile health.

OIDC operator login

Device-code · loopback+PKCE · IdP discovery · JWKS-validated ID tokens · refresh.

Workload identity

CI · Kubernetes · generic file · env-var.

RFC 8693 token exchange

Federated credential brokering.

Operator session

operator-session.json · craik login · craik logout · craik whoami.

Credential CLI

craik auth list / add / remove / test / status / approve / grant.

Doctor integration

Credential health surfaced in craik doctor.

Identity on every call

Operator and credential identity bound to every provider call and receipt.

Policy-bound identity

required_operator · allowed_operator_groups · allowed_credential_kinds · allowed_credential_profiles.

Approval-gated first use

For any credential profile.

Authorization binding

Operator-credential receipted grant chain.

Expiry as evidence

Credential expiry surfaced as evidence/risk in case files.

Per-credential redaction

Per-agent isolation

Credential and operator isolation in handoff records (consumed by v0.3.0 multi-agent runtime).

Explicitly not required for v0.1.0:

Resumable runs

Across process crashes.

Real sandbox tool execution

Provider budget enforcement

At the call boundary.

Schema migration framework

Multi-agent runtime

Beyond handoff identity bookkeeping.

Instruction distillation pipeline

Operator UI / TUI

Gateway daemon

And channel adapters.

MCP client/server

Skill / plugin runtime

Learning loops

Companion surfaces

Migration tooling

v0.2.0 · Durable execution continuity

Required outcome. A run interrupted at any phase boundary can be resumed cleanly with no duplicated side effects. Tool calls execute inside at least one real sandbox backend, gated per call. Budgets are enforced at the call boundary, not just declared. Persistent state survives schema changes via a documented migration path.

The execution continuity slices have landed: #552 covers phase-boundary resume and deterministic step idempotency keys. #554 covers per-run wall-clock budget enforcement before new phase or tool rounds. #556 covers provider token budget ledger updates and interruption before additional provider calls once the budget is exhausted. #559 covers operator-facing run inspection, resume, and cancellation commands. #561 covers runtime exit-discipline checks persisted at the handoff boundary. #563 covers tool-result attestations for dispatched provider tool calls. #565 covers registered local-process sandbox execution for shell tool calls. #567 covers cancellation propagation into in-flight local-process sandbox commands. #569 covers the registered local-store migration runner and example migration. #571 covers the CLI run-delta view for persisted continuity state.

Resumable interrupted runs

Shipped: interrupted runs reopen from persisted phase outputs and continue at the next unfinished phase.

Step-level idempotency keys

Shipped: stable keys are recorded in run state and runner step context to avoid duplicated phase outputs and side effects on replay.

Time controls

Shipped: per-run wall-clock budgets interrupt before the next phase or tool round when exhausted.

Provider budget enforcement

Shipped: provider token budgets are decremented from usage metadata and interrupt before the next provider call when exhausted.

Run inspection & recovery

Shipped: craik run show, craik run resume, and craik run cancel expose persisted continuity state.

Agent exit discipline

Shipped: handoff creation persists exit-discipline checks so missing validation, risks, or next steps are runtime state.

Tool result attestation

Shipped: dispatched tool calls persist hashed attestations linked to the side-effect receipt and replay message.

One real sandbox backend

Shipped: local_process executes registered command references through subprocess.run without shell expansion when the loop is configured with a sandbox backend.

Sandbox cancellation

Shipped: local-process sandbox commands poll a cancellation event, terminate in-flight processes, and replay a cancelled tool result.

Schema migration framework

Shipped: local-store migrations run through a registered, forward-only migration runner with an example metadata migration.

Run delta view

Shipped: craik run delta renders persisted run-delta records and linked recovery sessions as an operator view or JSON.

v0.3.0 · Multi-agent review and coordination

Required outcome. A handoff produced by agent A can be consumed by agent B as the starting state of a new governed run. Two agents working against the same project are coordinated via the work graph and intent locks without colliding. Disagreement between agents produces structured debate artifacts with receipted resolution.

Handoff consumption

Shipped first slice: craik task resume --from-handoff=<id> creates a new task, case file, and pending run with source handoff provenance and explicit consumer identity.

Role-based dispatch

Shipped first slice: provider-backed runs can dispatch implementer · verifier · adversarial reviewer · policy reviewer · docs reviewer · memory curator · adjudicator roles with policy allow-lists, dispatch receipts, and run metadata.

Multi-agent message contract

Shipped first slice: craik agent-message send and craik agent-message receive persist authenticated send/receive receipts and link to task, run, handoff, and role identities.

Concurrent run coordination

Shipped first slice: simultaneous loops on the same project are checked against active intent-lock scopes before new phases or tool dispatch, and overlapping scopes produce denial receipts.

Structured debate runtime

Shipped first slice: role-linked positions become typed debate turns, summaries preserve agreement or disagreement, and resolution records an adjudication or human-delegation receipt.

Cross-agent review protocol

Shipped first slice: review requests target worker results, handoffs, or debate summaries; review results carry typed findings, receipts, and source artifact links without mutating the reviewed artifact.

Human delegation at runtime

Shipped first slice: craik delegation pause interrupts a run with a receipted delegation request, craik delegation resolve records accepted/rejected/cancelled responses, and existing run resume continues from the interrupted boundary.

Scope-change protocol

Shipped first slice: discovered scope outside the active intent lock interrupts the run, persists a receipted request, and craik scope-change decide requires an explicit expand / sibling / handoff / denial decision.

Live work graph

Shipped first slice: v0.3.0 coordination artifacts persist work-graph events, and operators can query the active graph before final export.

Per-agent isolation enforced

Shipped first slice: handoff consumers record their own credential/operator assignment, producer identity reuse is denied by default, and intentional continuation requires an explicit flag plus rationale.

v0.4.0 · Runtime instruction distillation

Required outcome. Declared instruction files in a repo are ingested into typed, provenance-linked distillation items with categorized extraction, stale invalidation, contradiction surfacing, and an approval flow. Approved distillations participate in case files and prompt compilation as first-class evidence.

Source registry

Shipped first slice: declared sources are registered explicitly with receipts, project confinement, symlink escape protection, and active source lists.

Source ingestion

Shipped first slice: AGENTS.md · CLAUDE.md · GEMINI.md · HERMES.md · SKILLS.md · .cursorrules · .github/copilot-instructions.md · .codex/instructions.md · declared policy docs parse through a project-confined pipeline.

Source hash tracking

Shipped first slice: newline-normalized SHA-256 snapshots track new, unchanged, changed, missing, and oversize source states with stale-invalidation input.

Line/range provenance

Shipped first slice: extracted statements retain source snapshot IDs, line/column ranges, summaries, and canonical excerpt hashes.

Categorized extraction

Shipped first slice: deterministic proposal creation covers instruction · policy · preference · command · boundary · handoff-rule · memory-rule · security-rule · stale-risk with unclassified warnings.

Inter-source contradictions

Shipped first slice: normalized cross-source conflicts create contradiction reports while same-source and stale candidates are skipped.

Approval flow

Shipped first slice: governing constraints require explicit operator approval, receipt HMAC verification, active-session identity binding, and override rationale for stale or contradicted proposals.

Case-file integration

Shipped first slice: governing distillations load as first-class evidence with provenance ranges and approval receipt snapshots.

Prompt compilation

Shipped first slice: governing distillations render in one sanitized, deterministic Active instruction constraints section with stale-exclusion warnings.

Distillation CLI

Shipped first slice: craik instructions register / ingest / list / approve / reject / show drives the lifecycle through the active operator session.

v0.5.0 · Quality, continuity, and recovery

Required outcome. Craik helps agents recover, improve handoffs, avoid stale context, and explain what changed between runs.

Recovery mode

Ready: #636. Recovery sessions and run deltas are persisted, HMAC-protected in the local store, and exposed through operator-gated craik run recover / craik run delta.

Runtime critic

Ready: #637. Reviewable, non-authoritative critic findings are captured through production helpers and craik review critic.

Red team mode

Ready: #638. Red-team findings are durable quality evidence captured through craik review red-team without becoming privileged instructions.

Handoff quality score

Ready: #640. Handoff creation persists deterministic score bands and names blocking reasons for poor handoffs.

Evidence coverage score

Ready: #639. Handoff creation persists missing evidence ids and weak claims as inspectable gaps.

Context debt tracking

Ready: #641. Omitted, stale, missing, and unresolved context becomes queryable debt with remediation state.

Tool result attestation

Ready: #643. Observed tool outputs are attested with trust class, evidence or receipt links, expiry, output-hash verification, and local HMAC integrity metadata.

Knowledge freshness probes

Ready: #644. Fresh, expiring, expired, and missing probes are captured through production helpers and produce stale-risk warnings.

Evidence expiration rules

Ready: #642. Expired and missing attestations are classified before reuse.

Known traps

Ready: #646. Active traps carry evidence, avoidance guidance, and expiry or contradiction state, and can be captured with craik knowledge trap.

Negative knowledge

Ready: #647. Evidence-backed negative statements remain scoped and freshness-bounded, and contradictions are opened instead of silently replacing positive assertions.

Scratchpad with expiry

Ready: #645. Temporary working notes are captured, expire, and are excluded from active summaries after expiry.

First-class unknowns

Ready: #648. Unresolved and resolved unknowns carry owner, next action, resolution state, and receipt linkage for resolution.

Structured context requests

Ready: #650. Missing context can be requested, fulfilled, cancelled, linked to handoffs, recovery, or unknowns, and fulfilled with receipt linkage.

"What changed since last time" deltas

Ready: #652. Run deltas summarize current and previous handoff, case-file, receipt, contradiction, and constraint state.

Agent exit discipline

Ready: #651. Handoff creation enforces validation, handoff, residual-risk, next-step, unknown, and open-context checks unless an explicit blocked-exit override rationale is recorded.

Release readiness and docs assessment

Ready: #649. Release-readiness docs now record v0.5.0 validation, remediation, and release-prep gates.

v0.6.0 · Skills, plugins, and ecosystem foundations

Required outcome. Craik can support reusable skills and governed plugins without weakening the runtime security model.

Skill package format

Ready: #659. Skill packages use semantic package versions, docs and entrypoints, no runtime authority, and explicit context declarations for expected inputs.

Project-scoped & global skills

Ready: #660. Registries enforce project/global scope, active entry completeness, active-only precedence, and project override precedence.

Context contracts for skills

Ready: #661. Invocation contexts capture redacted inputs, outputs, omissions, policy links, receipts, and package requirement validation.

Plugin descriptor format

Ready: #662. Descriptors declare identity, trust boundary, semantic versioning, entrypoints, capability requests, docs, security notes, and compatibility without granting authority.

Probationary plugins

Ready: #663. Probation records keep new or changed plugins out of durable trust until criteria, evidence, compatibility checks, and decisions are recorded.

Plugin capability grants

Ready: #664. Plugin grants are descriptor-bound, evidence-linked, scoped to explicit operations and targets, approval-aware, and expiry-checked.

Plugin receipts

Ready: #665. Plugin receipts are redacted, descriptor-linked, grant-linked, evidence-linked, handoff-linked, and operator-visible with optional probation state.

Adapter packages

Ready: #666. Adapter packages declare semantic versions, entrypoints, capability surfaces, runner modes, Python/platform compatibility, docs, provenance, and linked plugins.

Reference integrations

Ready: #667. Reference integrations provide safe reproducible skill, plugin, and adapter examples with narrow matching links, checks, fixtures, receipts, and provenance.

Community skills docs

Ready: #668. The guide covers package authoring, context declarations, project/global registry scope, review expectations, and security boundaries.

Community plugins docs

Ready: #669. The guide covers descriptors, probation, grants, receipts, adapters, references, and no-ambient-authority review boundaries.

Release readiness and docs assessment

Ready: #670. Release-readiness docs record v0.6.0 goal status, validation, security notes, blockers, changelog coverage, and release automation hygiene.

v0.7.0 · Operator experience

Required outcome. Operators can inspect project state without reading raw logs.

Dashboard / TUI decision

Ready · CLI-first craik operator overview selected for v0.7.0.

Work graph explorer

Ready · craik operator work-graph renders terminal and JSON graph inspection.

Handoff viewer

Ready · craik operator handoff renders durable handoff summaries.

Receipt viewer

Ready · craik operator receipt renders capability and plugin receipts.

Contradiction inbox

Ready · craik operator contradictions lists review-only contradiction state.

Evidence & assumption views

Ready · craik operator evidence keeps assumptions separate from evidence.

Delegation queue

Ready · craik operator delegations lists human delegation points.

Budget / quota view

Ready · craik operator budget displays missing budget data explicitly.

Instruction distillation view

Ready · craik operator instructions renders sources, snapshots, provenance, proposals, and reviews.

Quality gate view

Ready · craik operator quality summarizes handoff, evidence, critic, and red-team signals.

Memory impact preview

Ready · craik operator memory-impact inspects previewed durable-memory effects.

Known traps view

Ready · craik operator traps renders known traps and negative knowledge with project and task filters.

Run delta view

Ready · craik operator run-delta inspects persisted recovery and continuity deltas.

The operator surface is session-bound and scoped before release prep: read-only commands require an active operator session, multi-project list views require explicit project scope, and operator text/JSON output uses the same sanitization and redaction boundary as runtime memory and receipt paths.

v0.8.0 · Operator integrations and always-on gateway

Required outcome. Craik can run as an always-on operator service with controlled ingress from external channels.

Gateway daemon mode

craik setup wizard

craik doctor diagnostics

craik update guidance

Channel adapter contract

First messaging channel adapter

Inbound identity & pairing

Channel allowlists

Channel-scoped policy envelopes

Webhook ingress

Scheduled automations

Cron-like task creation

Gateway receipts

Gateway troubleshooting docs

Deferred until this phase or later.

Broad channel matrix · consumer assistant positioning · open inbound DM behavior · mobile companion surfaces.

v0.9.0 · Persistent agent runtime, providers, and sandboxes

Required outcome. A user can launch a persistent Craik agent with craik or craik run, authenticate it against OpenAI, Anthropic, Gemini, or local models, and have Craik choose model/provider/runtime execution paths while enforcing environment boundaries explicitly across multiple sandbox backends.

Persistent Craik agent runtime

craik / craik run launch UX

Agent lifecycle commands

Start · status · stop · restart.

Agent session state contract

Agent session event contract

Prompt · run · receipt · handoff · interruption · exit.

Provider authentication flow

OpenAI · Anthropic · Gemini · local models.

Guided provider setup UX

Provider-backed agent sessions

Interactive prompt loop

Run / agent boundary decision

Model provider registry

Provider switching UX

Provider failover policy

Provider budget & quota links

Local model routing

Local model presets

OpenAI-compatible · Ollama-style endpoints.

Gemini provider/runtime path

Provider certification matrix

Failure recovery

Reconnect · resume · auth expiry · sandbox failure.

Persistent-agent security model

End-to-end launch demo

MCP client integration

MCP server / export decision

Local process backend

Docker sandbox backend

SSH or remote shell backend

Browser / tool execution boundary

Environment capability receipts

Sandbox policy tests

Provider routing docs

Implementation status: ready for release prep.

The v0.9.0 goal workflow shipped through milestone issues #737, #738, #740, #741, #742, #743, #744, and #745. Release prep remains responsible for the final version bump, changelog, signed tag, package publication, docs publication, and post-release verification.

v0.10.0 · Agent shell, progressive setup, and learning controls

Required outcome. A user can launch Craik with craik before any provider or operator auth is configured, receive clear in-runtime guidance, configure auth/model/session state through browser-assisted and slash-command flows, and review self-improving skill changes without allowing agents to silently rewrite their own authority.

craik interactive agent shell

craik chat and quiet one-shot mode

Progressive setup states

Unconfigured · fixture · local model · operator-only · provider-only · fully ready · restricted/offline.

Runtime slash-command registry

/help · /setup · /auth · /provider · /model · /status · /doctor · /sessions · /approvals.

Browser-assisted provider login

OpenAI · Anthropic · Gemini · local models, using official OAuth where available and guided secure key capture otherwise.

Secure credential storage

OS keychain backends plus explicit file-backed fallback warnings.

Model UX layer

List · status · set · probe · aliases · fallbacks · in-session switching.

Session UX

List · show · resume · rename · export · prune · delete.

Profiles and personas

Isolated provider config, sessions, skills, memory, and gateway state.

Skill performance telemetry

Autonomous skill proposals

Skill improvement proposals

Skill eval / replay harness

Periodic memory review nudges

Preference modeling as facts

Learning-loop receipts

Promotion approval gates

Rollback path

For bad skill updates.

Usage and insight summaries

Provider calls · tokens · costs where known · approvals · denials · session activity · skill impact.

Trajectory export format

Trajectory compression

Learning-loop docs

Builds on instruction distillation and the skill/plugin system. Agents may propose changes to skills, but changes remain reviewable until policy allows promotion. The agent shell is the public interaction surface for setup and learning controls; subsystem CLI commands remain available for automation.

Implementation status: ready for release prep.

The v0.10.0 goal workflow shipped through milestone issue #779. Release prep is tracked in #781 and remains responsible for the final version bump, changelog, signed tag, package publication, docs publication, and post-release verification.

v0.11.0 · TUI, dashboard, desktop, gateway operations, and channels

Required outcome. Craik can expose durable agent work through a keyboard-first TUI, authenticated local dashboard, desktop companion surface, manageable gateway service lifecycle, first real channel adapters, and multimodal companion contracts without compromising its policy and evidence model.

craik --tui

Shared slash commands · model/session pickers · approvals · run/handoff/receipt panels · streaming output.

craik dashboard

Authenticated local web dashboard for status, sessions, runs, approvals, provider/model state, gateway logs, and skill proposals.

Desktop companion MVP

Gateway control · provider health · approval notifications · dashboard launch · diagnostics.

Gateway service lifecycle

Install · uninstall · start · stop · restart · status · logs · doctor.

Real channel adapters

WebChat · Telegram · Discord · Slack.

Channel pairing and allowlists

Channel-scoped policy envelopes

Approval queue UX

Shell · TUI · dashboard · desktop notifications.

Product-grade diagnostics

craik doctor --fix for narrow, explicit setup and security posture repairs.

Update workflow

craik update --check · craik update.

Voice I/O posture

Speech-to-text adapter contract

Text-to-speech adapter contract

Multimodal artifact references

Desktop companion app security

Mobile companion app decision

Visual workspace decision

Work graph → workspace bridge

Accessibility requirements

Multimodal redaction tests

Product surface phase.

This phase turns the governed runtime into a usable local agent product. The companion surfaces must share the same command/action registry, auth model, policy gates, and receipt boundaries as the CLI.

The v0.11.0 goal workflow is complete. Release prep is tracked in #823 and remains responsible for the final version bump, changelog, signed tag, package publication, docs publication, and post-release verification.

v0.12.0 · Migration, ecosystem compatibility, and i18n

Required outcome. Teams can adopt Craik from adjacent tools and operate it in broader language and ecosystem contexts through executable import dry-runs, compatibility fixtures, bridge protocols, secret migration policy, and localized operator-facing surfaces.

Adjacent-runtime migration inspect

Adjacent-runtime migration plan

Adjacent-runtime import dry-run

Migration reports

Automatic imports · manual actions · skipped secrets · security posture changes · next commands.

Memory / skill / config migration maps

Secret migration implementation

No raw secret copy by default · OS keychain import · redacted migration receipts.

Compatibility fixtures

Provider config · model fallback · profile · channel binding · session · memory · skill · schedule · sandbox shapes.

MCP server mode

MCP client config import/export

Session export/import compatibility

Agent/client protocol bridge decision

Multi-agent workflow bridge

Locale / i18n framework

Localized shell/TUI/dashboard messages

Translated docs strategy

Ecosystem compatibility tests

v0.12.x fast-follow status: provider OAuth suite ready for release prep.

The v0.12.7 goal workflow shipped provider OAuth contracts, loopback PKCE helpers, Anthropic Claude CLI delegation, Gemini/Vertex ADC and service-account login through google-auth, OpenAI browser PKCE OAuth, provider-specific header handling, craik auth login <provider> --mode=api-key|oauth|claude-cli, billing-surface status metadata, callback-safety CI, and current authentication docs through milestone issues #936, #937, #940, #938, #939, and #941. Release prep is tracked in #942.

Post-MVP stability · Professional agent runtime

Required outcome. Craik is stable enough for external teams to use for real multi-agent software-delivery workflows.

Graduation gate, not a scheduled release.

Ship a robust 0.x.0 MVP first, then continue shipping 0.x.0 releases until the bar below is met by real usage, documentation maturity, compatibility confidence, and security posture.

Required capabilities. MVP-readiness items are tracked in Robust MVP Roadmap before the first usable 0.x.0.

Stable core schemas

Migration path

For persisted state.

SemVer release process

Package publication

Security release process

Complete CLI/reference docs

Production Stigmem integration

Documented limits & failure modes

Runnable demo

Community contribution path

≥1 complete runner adapter end-to-end

Policy tests in CI

Public/internal boundary classifier

Provenance-aware documentation

Memory hygiene workflow

Work product classification

Decision record suggestions

Learning without self-trust

Confidence requirements before 1.0.0:

  1. At least one complete runner adapter has been used successfully on real workflows.
  2. Stigmem-backed memory has soaked on real projects.
  3. Persisted schema migrations have been exercised.
  4. Security and redaction behavior has been tested under realistic agent runs.
  5. Documentation is complete enough for external users without maintainer hand-holding.
  6. Community contribution and support expectations are clear.
  7. Known limitations are documented honestly.

Executable workstreams

Each workstream below becomes one or more GitHub milestones/issues. Documentation requirements are part of the definition of done.

0 · Project foundation

Scope: package metadata · Python 3.12+ skeleton · craik CLI · MIT license · governance files · dependency lock strategy · CI quality gates · package-name reservation or publication.

Validation: craik --version works · tests run in CI · lint/type checks run in CI · package metadata validates.

Docs: installation · quickstart stub · contribution guide updates · release/support note · limitations note for pre-0.1.0.

1 · Runtime contracts

Scope: task request · project profile · policy envelope · capability grant · capability receipt · case file · agent role · worker result · handoff · memory proposal · memory backend capabilities · contradiction report · work graph event · evidence reference · assumption · delegation point · intent lock · instruction distillation item · quality gate result · artifact classification.

Validation: schema fixtures · invalid fixture tests · JSON serialization tests · version field tests.

Docs: schema reference · examples for each contract · versioning and migration policy.

2 · Local state and project registry

Scope: ~/.craik default home · CRAIK_HOME override · config/ · secrets/ · state/ · cache/ · logs/ · receipts/ · handoffs/ · case-files/ · projects/ · secure permissions where supported · SQLite store · project registry · immutable path config · project-local .craik/ opt-in only.

Validation: path resolver tests · permission tests · registry persistence tests · project-local opt-in tests.

Docs: configuring Craik home · local state layout reference · secrets handling guide.

3 · Policy, grants, redaction, receipts

Scope: strict / trusted-local / automation profiles · fail-open profile visibility · capability grants · immutable path protection · central redaction utility · shell/file/GitHub/memory grant enforcement · receipt persistence · policy denial receipts.

Validation: policy fixture tests · redaction tests · immutable-path tests · fail-open receipt tests · automation fail-closed tests.

Docs: policy profiles reference · fail-open guide · capability grants guide · redaction and secrets docs.

4 · Case files, intent, evidence, assumptions

Scope: task intent lock · repository state ingestion · docs and ADR discovery · default discovery exclusions for generated/dependency/build/cache/archive-heavy paths · project and user override rules · visible context-debt metadata · Stigmem/local fact loading · GitHub context placeholders · evidence references · assumption ledger · context budget metadata · stale-risk markers · context explanations · structured context requests · first-class unknowns · context debt tracking.

Validation: deterministic fixture output · evidence reference tests · assumption promotion tests · context inclusion/exclusion tests · default exclusion tests · override tests · stale-risk tests.

Docs: case file concept doc · using case files guide · evidence and assumptions guide · context budgeting guide · context discovery and exclusion guide.

5 · Handoffs, self-audit, exit discipline

Scope: structured handoff · Markdown handoff · self-audit before handoff · incomplete-run handoff · handoff quality score · unresolved questions · next steps · receipt links · memory proposal links · context debt links.

Validation: handoff schema tests · self-audit checklist tests · quality score fixture tests · interrupted-run fixture tests.

Docs: handoff concept doc · writing handoffs guide · self-audit reference · recovery and incomplete-run guide.

6 · Memory backends and Stigmem integration

Scope: ephemeral backend · local backend · Stigmem backend · capability detection · health and metadata checks · fact query/list/get/write · provenance reads · optional recall · optional conflicts · local proposal model · memory diff · memory impact preview · source identity handling · source attestation handling · error mapping.

Validation: backend interface tests · local backend persistence tests · Stigmem integration tests against a local node · auth failure tests · optional-capability fallback tests · memory diff tests.

Docs: memory backend reference · connecting Stigmem guide · Stigmem compatibility matrix · memory proposal and promotion guide · memory impact preview guide.

7 · GitHub adapter and demo workflow

Scope: GitHub auth detection · repository metadata · issues · PRs · changed files · check status · guarded GitHub comments/issues/PR creation · first Stigmem docs reconciliation demo.

Validation: mocked GitHub adapter tests · read-only fallback tests · permission failure tests · fixture demo run.

Docs: GitHub adapter guide · first Stigmem reconciliation demo · public/internal boundary guidance · troubleshooting guide.

8 · Work graph, contradictions, delegation

Scope: graph nodes and edges · task/handoff/fact/proposal/receipt/evidence/assumption/delegation/artifact nodes · contradiction reports · Stigmem conflict linking · local contradiction reports · human delegation points · approval/clarification/policy-override/memory-promotion/release-signoff requests.

Validation: graph export tests · contradiction lifecycle tests · delegation lifecycle tests · unresolved delegation block tests.

Docs: work graph concept doc · contradiction inbox guide · human delegation guide · graph export reference.

9 · Agent-native onboarding

Scope: craik onboard --project <project-id> · project model · active policies · ADRs and immutable paths · docs boundaries · recent handoffs · unresolved contradictions · stale-risk warnings · validation commands · Stigmem status · known traps · allowed next actions.

Validation: onboarding fixture tests · missing context tests · stale context tests · runner-readable output tests.

Docs: onboarding guide · known traps guide · project model concept doc.

10 · Runner adapters

Scope: runner adapter interface · Codex adapter · Claude adapter · Gemini adapter · runner capability matrix · policy-aware prompt compiler · runner metadata · normalized worker results · normalized handoffs · real-runner contract tests · runner trust profiles.

Validation: adapter interface tests · fixture contract tests · prompt compilation tests · runner capability matrix tests · real-runner smoke tests when credentials/tools are available.

Docs: runner adapter contract · Codex adapter guide · Claude adapter guide · Gemini adapter guide · prompt compiler reference · runner capability matrix reference.

11 · Single-agent execution loop

Scope: run id and run status model · task run state machine · plan/act/observe/evaluate/continue/stop phases · runner step contract · bounded case-file context with default exclusions and overrides · max-iteration limit · timeout and budget limits · intent-lock stop-condition enforcement · approval and grant checks before side effects · step receipts · observed output capture · memory proposal hooks · handoff on completion/block/failure/interruption · run resume · run recovery · agent exit discipline.

Validation: state-machine transition tests · max-iteration and timeout tests · budget exhaustion tests · stop-condition enforcement tests · approval-block tests · receipt-per-step tests · interrupted-run resume tests · handoff-on-failure tests · runner fixture tests · polluted-context fixture tests.

Docs: single-agent execution loop concept doc · running tasks guide · run state reference · resume and recovery guide · loop policy guide · context discovery override guide.

12 · Multi-agent coordination

Scope: orchestrator · specialist tasks · parallel read-only investigations · implementer/verifier/adversarial-reviewer/policy-reviewer/docs-reviewer/memory-curator/release-reviewer/adjudicator roles · typed worker results · cross-agent review protocol · structured agent debate · scope-change protocol.

Validation: child task graph tests · typed worker result tests · debate/adjudication fixture tests · unresolved-contradiction block tests · scope-change proposal tests.

Docs: multi-agent workflows guide · role reference · review protocol guide · structured debate guide.

13 · Runtime instruction distillation

Scope: declared instruction source registry · AGENTS.md · CLAUDE.md · GEMINI.md · HERMES.md · SKILLS.md · .cursorrules · .github/copilot-instructions.md · .codex/instructions.md · source hash tracking · line/range provenance · extraction categories · distillation proposals · stale distillation invalidation · instruction contradiction reports · promotion approval.

Validation: Markdown fixture tests · source hash invalidation tests · extraction category tests · contradiction fixture tests · approval/promotion tests.

Docs: instruction distillation concept doc · declaring instruction sources guide · distillation review guide · instruction categories reference.

14 · Quality gates and freshness

Scope: runtime critic · red team mode · evidence coverage score · tool result attestation · knowledge freshness probes · evidence expiration rules · negative knowledge · runtime memory hygiene · decision record suggestions · learning without self-trust.

Validation: critic fixture tests · red team policy tests · evidence coverage tests · tool-result source tests · freshness probe tests · memory hygiene proposal tests.

Docs: quality gates guide · freshness and staleness guide · negative knowledge guide · memory hygiene guide · decision record suggestion guide.

15 · Budgets, quotas, and operational bounds

Scope: context token budgets · model spend budgets · wall-clock budgets · shell command count · GitHub write count · memory write count · parallel worker count · retry count · approval count · budget receipts · budget escalation/block behavior.

Validation: budget accounting tests · exhaustion behavior tests · fail-open budget receipt tests · policy profile budget tests.

Docs: budget and quota guide · policy budget reference · troubleshooting budget exhaustion.

16 · Recovery and continuity

Scope: recovery mode · partial receipt loading · scratchpad restore · changed file detection · unfinished handoff recovery · unresolved delegation restore · "what changed since last time" deltas · run delta summaries.

Validation: interrupted-run fixtures · recovery command tests · delta calculation tests · partial handoff tests.

Docs: recovery guide · run deltas guide · interruption handling reference.

17 · Artifact and documentation intelligence

Scope: work product classification · provenance-aware documentation · public/internal boundary classifier · generated doc evidence links · docs stale-state detection · release note classification · audit artifact classification.

Validation: classifier fixture tests · public/internal boundary tests · provenance link tests · stale doc fixture tests.

Docs: artifact classification reference · provenance-aware docs guide · public/internal boundary guide · docs maintenance guide.

18 · Skills, plugins, and community ecosystem

Scope: skill package format · project-scoped skills · global skills · community skills layout · plugin descriptor format · probationary plugin policy · plugin capability grants · plugin receipts · adapter package guidance · reference integrations · marketplace/index format decision.

Validation: skill loader tests · plugin descriptor validation tests · probationary policy tests · plugin receipt tests · community package fixture tests.

Docs: skills concept doc · writing skills guide · community skills guide · plugin contract reference · writing plugins guide · plugin security guide · marketplace/index guide.

19 · Operator experience

Scope: TUI/dashboard decision · work graph explorer · handoff viewer · receipt viewer · contradiction inbox · evidence and assumption views · delegation queue · budget view · instruction distillation view · quality gate view · memory impact preview · known traps view · run delta view.

Validation: UI/TUI smoke tests · nonblank rendering checks · fixture state rendering tests · accessibility and keyboard navigation checks for UI surfaces.

Docs: operator guide · dashboard/TUI guide · view reference · troubleshooting guide.

20 · Operator integrations and always-on gateway

Scope: gateway daemon mode · setup wizard · diagnostics command · update guidance · channel adapter contract · first messaging channel adapter · inbound identity and pairing model · channel allowlists · channel-scoped policy envelopes · webhook ingress · scheduled automations · gateway receipts.

Validation: daemon lifecycle tests · setup wizard fixture tests · diagnostics failure-mode tests · webhook signature tests · channel identity mapping tests · scheduled task creation tests · gateway receipt tests · v0.8.0 gateway pipeline e2e test.

Docs: gateway guide · setup guide · diagnostics guide · channel adapter reference · webhook reference · scheduler guide · gateway security guide.

21 · Persistent agent runtime, providers, and sandboxes

Scope: persistent Craik agent runtime · craik / craik run launch UX · agent lifecycle commands · agent session state contract · agent session event contract · provider authentication flow for OpenAI, Anthropic, Gemini, and local models · guided provider setup UX · provider-backed agent sessions · interactive prompt loop · run / agent boundary decision · model provider registry · provider switching UX · provider failover policy · provider budget and quota links · Gemini provider/runtime path · local model routing · local model presets · provider certification matrix · failure recovery · persistent-agent security model · end-to-end launch demo · MCP client integration · MCP server/export decision · sandbox backend contract · local process backend · Docker sandbox backend · SSH or remote shell backend · browser/tool execution boundary · environment capability receipts.

Tracking: no v0.9.0 roadmap tile is silently untracked.

Validation: agent launch tests · lifecycle command tests · provider authentication tests · provider session persistence tests · agent event persistence tests · OpenAI/Anthropic/Gemini/local model routing tests · guided setup tests · local model preset tests · interactive prompt loop tests · receipt and handoff linkage tests · interruption and exit behavior tests · failure recovery tests · provider certification matrix checks · provider registry tests · provider failover tests · MCP compatibility fixture tests · sandbox policy tests · backend isolation tests · environment receipt tests · budget linkage tests · end-to-end launch demo test.

Docs: persistent agent runtime guide · agent lifecycle reference · provider authentication guide · local model setup guide · provider certification matrix · provider routing guide · provider config reference · MCP integration guide · sandbox backend reference · persistent-agent security guide · execution environment security guide.

22 · Self-improving skills and learning loops

Scope: skill performance telemetry · autonomous skill proposal creation · skill improvement proposals · skill eval/replay harness · periodic memory review nudges · user/team preference facts · learning-loop receipts · approval gates for promoted skills · rollback path for bad skill updates · training/trajectory export format · trajectory compression or summarization.

Validation: skill proposal tests · skill eval fixture tests · replay determinism tests · approval gate tests · rollback tests · trajectory export tests · learning-loop receipt tests.

Docs: skill improvement guide · learning-loop policy guide · skill eval reference · trajectory export reference · rollback guide.

23 · Multimodal and companion surfaces

Scope: voice input/output posture · speech-to-text adapter contract · text-to-speech adapter contract · multimodal artifact references · desktop companion app decision · mobile companion app decision · live visual workspace/canvas decision · work graph to visual workspace bridge · accessibility requirements.

Validation: multimodal artifact schema tests · redaction tests for transcript and media metadata · accessibility checks for companion surfaces · visual workspace smoke tests where implemented · adapter contract tests.

Docs: multimodal posture doc · voice adapter reference · companion app security guide · visual workspace guide · accessibility checklist.

24 · Migration, i18n, and ecosystem compatibility

Scope: adjacent-tool import/migration assessment · multi-agent workflow import/migration assessment · import dry-run reports · memory/skill/config migration maps · secret migration policy · ecosystem compatibility guide · adjacent runtime bridge decision · multi-agent workflow bridge decision · locale/i18n framework · translated docs strategy.

Validation: import dry-run fixture tests · migration map tests · secret redaction tests · bridge compatibility smoke tests where implemented · locale fallback tests · translated docs link tests where applicable.

Docs: migration guide · import dry-run reference · secret migration policy · ecosystem compatibility guide · i18n guide · bridge decision records.

v0.1.0 issue cut

The initial issue set covers only the v0.1.0 gate and any contracts needed to avoid rework.

  1. Scaffold Python package and craik CLI.
  2. Add core Pydantic schemas and fixtures.
  3. Implement ~/.craik path resolver and local state layout.
  4. Implement SQLite local store.
  5. Implement project registry.
  6. Implement strict / trusted-local / automation policy profiles.
  7. Implement capability grants and immutable path protection.
  8. Implement central redaction utility.
  9. Implement receipt store.
  10. Implement case file assembler with evidence, assumptions, and context budget metadata.
  11. Implement intent lock.
  12. Implement handoff writer and self-audit checklist.
  13. Implement local memory backend and proposal flow.
  14. Implement Stigmem backend minimum compatibility.
  15. Implement memory diff and memory impact preview foundations.
  16. Implement GitHub read adapter.
  17. Implement work graph export.
  18. Implement contradiction report model.
  19. Implement agent-native onboarding.
  20. Implement policy test harness and core policy tests.
  21. Implement Stigmem documentation reconciliation demo.
  22. Build initial docs tree and publish v0.1.0 user/concept/reference docs.

Each issue includes: implementation checklist · test/validation checklist · documentation checklist · security/policy impact · Stigmem fact update requirement when relevant.

Goal issue workflow

For each 0.x.0 goal issue, implementation is not complete until the change has a pull request and the pull request has passed required CI. The working branch is pushed after implementation, tests, docs, and local validation are complete. Then the PR checks are watched to a terminal state. If any required check fails, fix the failure in the same PR branch, push again, and repeat the check wait before closing the issue or marking the goal complete.

Do not close milestone issues from local validation alone. The goal workflow requires the PR branch to be current on GitHub and the required PR gates to be green. If the agent opened the PR and the checks are clean, the agent merges the PR, verifies the merge landed on the base branch, prunes stale local and remote branches, and only then moves to the next goal.

Documentation definition of done

What concept changed?

What user workflow changed?

What CLI / API / config changed?

What policy or security behavior changed?

What examples should exist?

What limitations apply?

What facts should future agents know?

For implementation issues, docs are updated in the same PR unless the issue is explicitly internal-only scaffolding.

Release definition of done

Passing tests

Passing lint / type checks

Generated or updated CLI/reference docs

Updated roadmap state

Updated limitations

Security notes

Migration notes

When local state or schemas change.

Runnable demo status

Memory update

Optional release-state memory only when Stigmem is available; this is not a release gate.

What's next