Roadmap
What you'll find here
The full Craik roadmap: rules every roadmap item must satisfy, the documentation model, the release-gate sequence from v0.1.0 through v0.12.0 (plus a post-MVP stability gate), and the 24 executable workstreams that turn those gates into shippable PRs.
Executable by design.
Every roadmap item must produce implementation, tests or validation, and documentation. Craik should not ship features that only exist as code or only exist as strategy. The smallest useful runtime ships first; everything else builds on top.
Roadmap rules
- Every feature has docs. User-facing behavior requires guide docs · runtime contracts require reference docs · policy behavior requires security/governance docs · adapter behavior requires integration docs.
- Docs ship with implementation. A feature is not done until its docs, examples, and validation guidance are merged.
- Strict by default. New capabilities must respect policy envelopes, grants, redaction, receipts, and memory-write defaults.
- Evidence before memory. No durable assertion or Stigmem write without evidence, provenance, and scope.
- Source remains canonical. Derived memory, distilled instructions, generated docs, and summaries must cite source artifacts.
- CLI first, UI later. CLI workflows prove the runtime before dashboard work broadens the surface.
- Stigmem is the reference substrate. Local mode exists for onboarding and tests, but full durable behavior assumes Stigmem.
Documentation model
Craik docs borrow the most useful patterns from three product lineages.
From Stigmem
Explicit concept docs · protocol & contract reference · generated API/CLI reference once implementation exists · roadmap & limitations · security & governance · durable examples tied to real workflows · clear public/internal boundaries.
From local agent runtimes
Practical setup · workspace/project mental model · tool/skill/plugin docs · channel/adapter docs · operator-friendly examples.
From multi-agent orchestration tools
CLI-first user guides · configuration docs · skills docs · security/approval docs · multi-agent workflow docs · exact dependency & supply-chain notes where relevant.
Target docs tree:
docs/
concepts/
durable-agent-runtime.md
project-models.md
case-files.md
handoffs.md
receipts.md
work-graph.md
memory-and-stigmem.md
governance.md
instruction-distillation.md
skills-and-plugins.md
guides/
installation.md
quickstart.md
first-stigmem-reconciliation-demo.md
configuring-craik-home.md
connecting-stigmem.md
using-case-files.md
writing-handoffs.md
running-policy-tests.md
runner-adapters.md
community-skills.md
community-plugins.md
reference/
cli.md
config.md
schemas.md
policy-profiles.md
memory-backends.md
runner-adapter-contract.md
plugin-contract.md
security/
index.md
redaction.md
secrets.md
capability-grants.md
fail-open-profiles.md
roadmap.md
limitations.md
Root-level project governance files remain authoritative for contribution, security disclosure, trademarks, and maintainership.
MVP release strategy
0.x.0 MVP first, 1.0.0 later.
1.0.0 is a later stability signal after real-world usage,
compatibility confidence, and security soak. The MVP still pulls
forward readiness work that affects trust: migrations, release
hygiene, package publication, generated docs, security process,
provider certification, public/internal boundaries, provenance
tracking, memory hygiene, and CI/CD depth.
The checkable MVP plan lives in Robust MVP Roadmap. The release gates below describe the contract and feature build-up through v0.12; the MVP roadmap turns those surfaces into release-quality workflows.
Release gates
Craik stays on 0.x.0 releases until the maintainers are confident
the product is stable enough to call 1.0. The gates below are
sequencing targets, not promises that 1.0.0 follows immediately
after 0.7.0. Additional 0.x.0 releases get added whenever the
product needs more soak time, compatibility work, security hardening,
or real-user validation.
v0.1.0 · Governed agent-runtime substrate
Required outcome. A user can register a real repo, authenticate via OIDC, assemble a governed case file, compile runner prompts, execute provider requests against OpenAI Responses / Anthropic Messages / OpenAI-compatible Chat Completions adapters (fixture-backed by default, live opt-in), resolve provider credentials through typed profiles or workload-federated brokering, record receipts that name both the operator identity and the credential identity, produce durable handoffs, propose memory updates, and export the work graph. Policy can constrain which operators and which credentials a task may use; credential authorization is itself a receipted graph.
Required capabilities:
Python 3.12+ package
And craik CLI.
MIT license
And governance files.
Local home
~/.craik default · CRAIK_HOME override.
Pydantic contracts
SQLite local state
Project registry
Policy profiles
Strict · trusted-local · automation.
Capability grants
Central redaction utility
Receipt store
Handoff writer
Memory backends
Local backend + Stigmem read client.
Case-file assembler
Evidence · assumptions · context budget · default exclusions with project/user overrides.
Intent locks
And self-audit before handoff.
Memory proposals
With diff and impact preview.
Work graph export
Read-only GitHub adapter
Stigmem demo
Docs reconciliation demo + behavior-test acceptance.
Provider transport
FixtureTransport · HTTPTransport over stdlib urllib with SSE.
Provider families
OpenAI Responses · Anthropic Messages · OpenAI-compatible Chat Completions.
Provider features
Tool-call round-trip · streaming chunk capture · retry · timeout · cancellation.
Single-agent loop
Task run state machine · plan/act/observe/evaluate phases · runner step contract · per-step receipts and policy gates · output capture · memory proposal creation · handoff on completion/block/failure.
Typed credentials
auth-profiles.json with <provider_family>:<name> IDs.
Credential sources
Env-var API key · local-CLI OAuth fallback · vendor-CLI subprocess bridge · external secret manager reference · marker · Stigmem-backed reference.
Credential pool
Rotation · failover · per-profile health.
OIDC operator login
Device-code · loopback+PKCE · IdP discovery · JWKS-validated ID tokens · refresh.
Workload identity
CI · Kubernetes · generic file · env-var.
RFC 8693 token exchange
Federated credential brokering.
Operator session
operator-session.json · craik login · craik logout · craik whoami.
Credential CLI
craik auth list / add / remove / test / status / approve / grant.
Doctor integration
Credential health surfaced in craik doctor.
Identity on every call
Operator and credential identity bound to every provider call and receipt.
Policy-bound identity
required_operator · allowed_operator_groups · allowed_credential_kinds · allowed_credential_profiles.
Approval-gated first use
For any credential profile.
Authorization binding
Operator-credential receipted grant chain.
Expiry as evidence
Credential expiry surfaced as evidence/risk in case files.
Per-credential redaction
Per-agent isolation
Credential and operator isolation in handoff records (consumed by v0.3.0 multi-agent runtime).
Explicitly not required for v0.1.0:
Resumable runs
Across process crashes.
Real sandbox tool execution
Provider budget enforcement
At the call boundary.
Schema migration framework
Multi-agent runtime
Beyond handoff identity bookkeeping.
Instruction distillation pipeline
Operator UI / TUI
Gateway daemon
And channel adapters.
MCP client/server
Skill / plugin runtime
Learning loops
Companion surfaces
Migration tooling
v0.2.0 · Durable execution continuity
Required outcome. A run interrupted at any phase boundary can be resumed cleanly with no duplicated side effects. Tool calls execute inside at least one real sandbox backend, gated per call. Budgets are enforced at the call boundary, not just declared. Persistent state survives schema changes via a documented migration path.
The execution continuity slices have landed: #552 covers phase-boundary resume and deterministic step idempotency keys. #554 covers per-run wall-clock budget enforcement before new phase or tool rounds. #556 covers provider token budget ledger updates and interruption before additional provider calls once the budget is exhausted. #559 covers operator-facing run inspection, resume, and cancellation commands. #561 covers runtime exit-discipline checks persisted at the handoff boundary. #563 covers tool-result attestations for dispatched provider tool calls. #565 covers registered local-process sandbox execution for shell tool calls. #567 covers cancellation propagation into in-flight local-process sandbox commands. #569 covers the registered local-store migration runner and example migration. #571 covers the CLI run-delta view for persisted continuity state.
Resumable interrupted runs
Shipped: interrupted runs reopen from persisted phase outputs and continue at the next unfinished phase.
Step-level idempotency keys
Shipped: stable keys are recorded in run state and runner step context to avoid duplicated phase outputs and side effects on replay.
Time controls
Shipped: per-run wall-clock budgets interrupt before the next phase or tool round when exhausted.
Provider budget enforcement
Shipped: provider token budgets are decremented from usage metadata and interrupt before the next provider call when exhausted.
Run inspection & recovery
Shipped: craik run show, craik run resume, and craik run cancel expose persisted continuity state.
Agent exit discipline
Shipped: handoff creation persists exit-discipline checks so missing validation, risks, or next steps are runtime state.
Tool result attestation
Shipped: dispatched tool calls persist hashed attestations linked to the side-effect receipt and replay message.
One real sandbox backend
Shipped: local_process executes registered command references through subprocess.run without shell expansion when the loop is configured with a sandbox backend.
Sandbox cancellation
Shipped: local-process sandbox commands poll a cancellation event, terminate in-flight processes, and replay a cancelled tool result.
Schema migration framework
Shipped: local-store migrations run through a registered, forward-only migration runner with an example metadata migration.
Run delta view
Shipped: craik run delta renders persisted run-delta records and linked recovery sessions as an operator view or JSON.
v0.3.0 · Multi-agent review and coordination
Required outcome. A handoff produced by agent A can be consumed by agent B as the starting state of a new governed run. Two agents working against the same project are coordinated via the work graph and intent locks without colliding. Disagreement between agents produces structured debate artifacts with receipted resolution.
Handoff consumption
Shipped first slice: craik task resume --from-handoff=<id> creates a new task, case file, and pending run with source handoff provenance and explicit consumer identity.
Role-based dispatch
Shipped first slice: provider-backed runs can dispatch implementer · verifier · adversarial reviewer · policy reviewer · docs reviewer · memory curator · adjudicator roles with policy allow-lists, dispatch receipts, and run metadata.
Multi-agent message contract
Shipped first slice: craik agent-message send and craik agent-message receive persist authenticated send/receive receipts and link to task, run, handoff, and role identities.
Concurrent run coordination
Shipped first slice: simultaneous loops on the same project are checked against active intent-lock scopes before new phases or tool dispatch, and overlapping scopes produce denial receipts.
Structured debate runtime
Shipped first slice: role-linked positions become typed debate turns, summaries preserve agreement or disagreement, and resolution records an adjudication or human-delegation receipt.
Cross-agent review protocol
Shipped first slice: review requests target worker results, handoffs, or debate summaries; review results carry typed findings, receipts, and source artifact links without mutating the reviewed artifact.
Human delegation at runtime
Shipped first slice: craik delegation pause interrupts a run with a receipted delegation request, craik delegation resolve records accepted/rejected/cancelled responses, and existing run resume continues from the interrupted boundary.
Scope-change protocol
Shipped first slice: discovered scope outside the active intent lock interrupts the run, persists a receipted request, and craik scope-change decide requires an explicit expand / sibling / handoff / denial decision.
Live work graph
Shipped first slice: v0.3.0 coordination artifacts persist work-graph events, and operators can query the active graph before final export.
Per-agent isolation enforced
Shipped first slice: handoff consumers record their own credential/operator assignment, producer identity reuse is denied by default, and intentional continuation requires an explicit flag plus rationale.
v0.4.0 · Runtime instruction distillation
Required outcome. Declared instruction files in a repo are ingested into typed, provenance-linked distillation items with categorized extraction, stale invalidation, contradiction surfacing, and an approval flow. Approved distillations participate in case files and prompt compilation as first-class evidence.
Source registry
Shipped first slice: declared sources are registered explicitly with receipts, project confinement, symlink escape protection, and active source lists.
Source ingestion
Shipped first slice: AGENTS.md · CLAUDE.md · GEMINI.md · HERMES.md · SKILLS.md · .cursorrules · .github/copilot-instructions.md · .codex/instructions.md · declared policy docs parse through a project-confined pipeline.
Source hash tracking
Shipped first slice: newline-normalized SHA-256 snapshots track new, unchanged, changed, missing, and oversize source states with stale-invalidation input.
Line/range provenance
Shipped first slice: extracted statements retain source snapshot IDs, line/column ranges, summaries, and canonical excerpt hashes.
Categorized extraction
Shipped first slice: deterministic proposal creation covers instruction · policy · preference · command · boundary · handoff-rule · memory-rule · security-rule · stale-risk with unclassified warnings.
Inter-source contradictions
Shipped first slice: normalized cross-source conflicts create contradiction reports while same-source and stale candidates are skipped.
Approval flow
Shipped first slice: governing constraints require explicit operator approval, receipt HMAC verification, active-session identity binding, and override rationale for stale or contradicted proposals.
Case-file integration
Shipped first slice: governing distillations load as first-class evidence with provenance ranges and approval receipt snapshots.
Prompt compilation
Shipped first slice: governing distillations render in one sanitized, deterministic Active instruction constraints section with stale-exclusion warnings.
Distillation CLI
Shipped first slice: craik instructions register / ingest / list / approve / reject / show drives the lifecycle through the active operator session.
v0.5.0 · Quality, continuity, and recovery
Required outcome. Craik helps agents recover, improve handoffs, avoid stale context, and explain what changed between runs.
Recovery mode
Ready: #636. Recovery sessions and run deltas are persisted, HMAC-protected in the local store, and exposed through operator-gated craik run recover / craik run delta.
Runtime critic
Ready: #637. Reviewable, non-authoritative critic findings are captured through production helpers and craik review critic.
Red team mode
Ready: #638. Red-team findings are durable quality evidence captured through craik review red-team without becoming privileged instructions.
Handoff quality score
Ready: #640. Handoff creation persists deterministic score bands and names blocking reasons for poor handoffs.
Evidence coverage score
Ready: #639. Handoff creation persists missing evidence ids and weak claims as inspectable gaps.
Context debt tracking
Ready: #641. Omitted, stale, missing, and unresolved context becomes queryable debt with remediation state.
Tool result attestation
Ready: #643. Observed tool outputs are attested with trust class, evidence or receipt links, expiry, output-hash verification, and local HMAC integrity metadata.
Knowledge freshness probes
Ready: #644. Fresh, expiring, expired, and missing probes are captured through production helpers and produce stale-risk warnings.
Evidence expiration rules
Ready: #642. Expired and missing attestations are classified before reuse.
Known traps
Ready: #646. Active traps carry evidence, avoidance guidance, and expiry or contradiction state, and can be captured with craik knowledge trap.
Negative knowledge
Ready: #647. Evidence-backed negative statements remain scoped and freshness-bounded, and contradictions are opened instead of silently replacing positive assertions.
Scratchpad with expiry
Ready: #645. Temporary working notes are captured, expire, and are excluded from active summaries after expiry.
First-class unknowns
Ready: #648. Unresolved and resolved unknowns carry owner, next action, resolution state, and receipt linkage for resolution.
Structured context requests
Ready: #650. Missing context can be requested, fulfilled, cancelled, linked to handoffs, recovery, or unknowns, and fulfilled with receipt linkage.
"What changed since last time" deltas
Ready: #652. Run deltas summarize current and previous handoff, case-file, receipt, contradiction, and constraint state.
Agent exit discipline
Ready: #651. Handoff creation enforces validation, handoff, residual-risk, next-step, unknown, and open-context checks unless an explicit blocked-exit override rationale is recorded.
Release readiness and docs assessment
Ready: #649. Release-readiness docs now record v0.5.0 validation, remediation, and release-prep gates.
v0.6.0 · Skills, plugins, and ecosystem foundations
Required outcome. Craik can support reusable skills and governed plugins without weakening the runtime security model.
Skill package format
Ready: #659. Skill packages use semantic package versions, docs and entrypoints, no runtime authority, and explicit context declarations for expected inputs.
Project-scoped & global skills
Ready: #660. Registries enforce project/global scope, active entry completeness, active-only precedence, and project override precedence.
Context contracts for skills
Ready: #661. Invocation contexts capture redacted inputs, outputs, omissions, policy links, receipts, and package requirement validation.
Plugin descriptor format
Ready: #662. Descriptors declare identity, trust boundary, semantic versioning, entrypoints, capability requests, docs, security notes, and compatibility without granting authority.
Probationary plugins
Ready: #663. Probation records keep new or changed plugins out of durable trust until criteria, evidence, compatibility checks, and decisions are recorded.
Plugin capability grants
Ready: #664. Plugin grants are descriptor-bound, evidence-linked, scoped to explicit operations and targets, approval-aware, and expiry-checked.
Plugin receipts
Ready: #665. Plugin receipts are redacted, descriptor-linked, grant-linked, evidence-linked, handoff-linked, and operator-visible with optional probation state.
Adapter packages
Ready: #666. Adapter packages declare semantic versions, entrypoints, capability surfaces, runner modes, Python/platform compatibility, docs, provenance, and linked plugins.
Reference integrations
Ready: #667. Reference integrations provide safe reproducible skill, plugin, and adapter examples with narrow matching links, checks, fixtures, receipts, and provenance.
Community skills docs
Ready: #668. The guide covers package authoring, context declarations, project/global registry scope, review expectations, and security boundaries.
Community plugins docs
Ready: #669. The guide covers descriptors, probation, grants, receipts, adapters, references, and no-ambient-authority review boundaries.
Release readiness and docs assessment
Ready: #670. Release-readiness docs record v0.6.0 goal status, validation, security notes, blockers, changelog coverage, and release automation hygiene.
v0.7.0 · Operator experience
Required outcome. Operators can inspect project state without reading raw logs.
Dashboard / TUI decision
Ready · CLI-first craik operator overview selected for v0.7.0.
Work graph explorer
Ready · craik operator work-graph renders terminal and JSON graph inspection.
Handoff viewer
Ready · craik operator handoff renders durable handoff summaries.
Receipt viewer
Ready · craik operator receipt renders capability and plugin receipts.
Contradiction inbox
Ready · craik operator contradictions lists review-only contradiction state.
Evidence & assumption views
Ready · craik operator evidence keeps assumptions separate from evidence.
Delegation queue
Ready · craik operator delegations lists human delegation points.
Budget / quota view
Ready · craik operator budget displays missing budget data explicitly.
Instruction distillation view
Ready · craik operator instructions renders sources, snapshots, provenance, proposals, and reviews.
Quality gate view
Ready · craik operator quality summarizes handoff, evidence, critic, and red-team signals.
Memory impact preview
Ready · craik operator memory-impact inspects previewed durable-memory effects.
Known traps view
Ready · craik operator traps renders known traps and negative knowledge with project and task filters.
Run delta view
Ready · craik operator run-delta inspects persisted recovery and continuity deltas.
The operator surface is session-bound and scoped before release prep: read-only commands require an active operator session, multi-project list views require explicit project scope, and operator text/JSON output uses the same sanitization and redaction boundary as runtime memory and receipt paths.
v0.8.0 · Operator integrations and always-on gateway
Required outcome. Craik can run as an always-on operator service with controlled ingress from external channels.
Gateway daemon mode
craik setup wizard
craik doctor diagnostics
craik update guidance
Channel adapter contract
First messaging channel adapter
Inbound identity & pairing
Channel allowlists
Channel-scoped policy envelopes
Webhook ingress
Scheduled automations
Cron-like task creation
Gateway receipts
Gateway troubleshooting docs
Deferred until this phase or later.
Broad channel matrix · consumer assistant positioning · open inbound DM behavior · mobile companion surfaces.
v0.9.0 · Persistent agent runtime, providers, and sandboxes
Required outcome. A user can launch a persistent Craik agent with
craik or craik run, authenticate it against OpenAI, Anthropic,
Gemini, or local models, and have Craik choose model/provider/runtime
execution paths while enforcing environment boundaries explicitly
across multiple sandbox backends.
Persistent Craik agent runtime
craik / craik run launch UX
Agent lifecycle commands
Start · status · stop · restart.
Agent session state contract
Agent session event contract
Prompt · run · receipt · handoff · interruption · exit.
Provider authentication flow
OpenAI · Anthropic · Gemini · local models.
Guided provider setup UX
Provider-backed agent sessions
Interactive prompt loop
Run / agent boundary decision
Model provider registry
Provider switching UX
Provider failover policy
Provider budget & quota links
Local model routing
Local model presets
OpenAI-compatible · Ollama-style endpoints.
Gemini provider/runtime path
Provider certification matrix
Failure recovery
Reconnect · resume · auth expiry · sandbox failure.
Persistent-agent security model
End-to-end launch demo
MCP client integration
MCP server / export decision
Local process backend
Docker sandbox backend
SSH or remote shell backend
Browser / tool execution boundary
Environment capability receipts
Sandbox policy tests
Provider routing docs
Implementation status: ready for release prep.
The v0.9.0 goal workflow shipped through milestone issues #737, #738, #740, #741, #742, #743, #744, and #745. Release prep remains responsible for the final version bump, changelog, signed tag, package publication, docs publication, and post-release verification.
v0.10.0 · Agent shell, progressive setup, and learning controls
Required outcome. A user can launch Craik with craik before any
provider or operator auth is configured, receive clear in-runtime
guidance, configure auth/model/session state through browser-assisted
and slash-command flows, and review self-improving skill changes without
allowing agents to silently rewrite their own authority.
craik interactive agent shell
craik chat and quiet one-shot mode
Progressive setup states
Unconfigured · fixture · local model · operator-only · provider-only · fully ready · restricted/offline.
Runtime slash-command registry
/help · /setup · /auth · /provider · /model · /status · /doctor · /sessions · /approvals.
Browser-assisted provider login
OpenAI · Anthropic · Gemini · local models, using official OAuth where available and guided secure key capture otherwise.
Secure credential storage
OS keychain backends plus explicit file-backed fallback warnings.
Model UX layer
List · status · set · probe · aliases · fallbacks · in-session switching.
Session UX
List · show · resume · rename · export · prune · delete.
Profiles and personas
Isolated provider config, sessions, skills, memory, and gateway state.
Skill performance telemetry
Autonomous skill proposals
Skill improvement proposals
Skill eval / replay harness
Periodic memory review nudges
Preference modeling as facts
Learning-loop receipts
Promotion approval gates
Rollback path
For bad skill updates.
Usage and insight summaries
Provider calls · tokens · costs where known · approvals · denials · session activity · skill impact.
Trajectory export format
Trajectory compression
Learning-loop docs
Builds on instruction distillation and the skill/plugin system. Agents may propose changes to skills, but changes remain reviewable until policy allows promotion. The agent shell is the public interaction surface for setup and learning controls; subsystem CLI commands remain available for automation.
Implementation status: ready for release prep.
The v0.10.0 goal workflow shipped through milestone issue #779. Release prep is tracked in #781 and remains responsible for the final version bump, changelog, signed tag, package publication, docs publication, and post-release verification.
v0.11.0 · TUI, dashboard, desktop, gateway operations, and channels
Required outcome. Craik can expose durable agent work through a keyboard-first TUI, authenticated local dashboard, desktop companion surface, manageable gateway service lifecycle, first real channel adapters, and multimodal companion contracts without compromising its policy and evidence model.
craik --tui
Shared slash commands · model/session pickers · approvals · run/handoff/receipt panels · streaming output.
craik dashboard
Authenticated local web dashboard for status, sessions, runs, approvals, provider/model state, gateway logs, and skill proposals.
Desktop companion MVP
Gateway control · provider health · approval notifications · dashboard launch · diagnostics.
Gateway service lifecycle
Install · uninstall · start · stop · restart · status · logs · doctor.
Real channel adapters
WebChat · Telegram · Discord · Slack.
Channel pairing and allowlists
Channel-scoped policy envelopes
Approval queue UX
Shell · TUI · dashboard · desktop notifications.
Product-grade diagnostics
craik doctor --fix for narrow, explicit setup and security posture repairs.
Update workflow
craik update --check · craik update.
Voice I/O posture
Speech-to-text adapter contract
Text-to-speech adapter contract
Multimodal artifact references
Desktop companion app security
Mobile companion app decision
Visual workspace decision
Work graph → workspace bridge
Accessibility requirements
Multimodal redaction tests
Product surface phase.
This phase turns the governed runtime into a usable local agent product. The companion surfaces must share the same command/action registry, auth model, policy gates, and receipt boundaries as the CLI.
The v0.11.0 goal workflow is complete. Release prep is tracked in #823 and remains responsible for the final version bump, changelog, signed tag, package publication, docs publication, and post-release verification.
v0.12.0 · Migration, ecosystem compatibility, and i18n
Required outcome. Teams can adopt Craik from adjacent tools and operate it in broader language and ecosystem contexts through executable import dry-runs, compatibility fixtures, bridge protocols, secret migration policy, and localized operator-facing surfaces.
Adjacent-runtime migration inspect
Adjacent-runtime migration plan
Adjacent-runtime import dry-run
Migration reports
Automatic imports · manual actions · skipped secrets · security posture changes · next commands.
Memory / skill / config migration maps
Secret migration implementation
No raw secret copy by default · OS keychain import · redacted migration receipts.
Compatibility fixtures
Provider config · model fallback · profile · channel binding · session · memory · skill · schedule · sandbox shapes.
MCP server mode
MCP client config import/export
Session export/import compatibility
Agent/client protocol bridge decision
Multi-agent workflow bridge
Locale / i18n framework
Localized shell/TUI/dashboard messages
Translated docs strategy
Ecosystem compatibility tests
v0.12.x fast-follow status: provider OAuth suite ready for release prep.
The v0.12.7 goal workflow shipped provider OAuth contracts, loopback
PKCE helpers, Anthropic Claude CLI delegation, Gemini/Vertex ADC
and service-account login through google-auth, OpenAI browser PKCE OAuth,
provider-specific header handling, craik auth login <provider> --mode=api-key|oauth|claude-cli, billing-surface status metadata, callback-safety CI,
and current authentication docs through milestone issues
#936,
#937,
#940,
#938,
#939,
and #941.
Release prep is tracked in
#942.
Post-MVP stability · Professional agent runtime
Required outcome. Craik is stable enough for external teams to use for real multi-agent software-delivery workflows.
Graduation gate, not a scheduled release.
Ship a robust 0.x.0 MVP first, then continue shipping 0.x.0
releases until the bar below is met by real usage, documentation
maturity, compatibility confidence, and security posture.
Required capabilities. MVP-readiness items are tracked in
Robust MVP Roadmap before the first usable 0.x.0.
Stable core schemas
Migration path
For persisted state.
SemVer release process
Package publication
Security release process
Complete CLI/reference docs
Production Stigmem integration
Documented limits & failure modes
Runnable demo
Community contribution path
≥1 complete runner adapter end-to-end
Policy tests in CI
Public/internal boundary classifier
Provenance-aware documentation
Memory hygiene workflow
Work product classification
Decision record suggestions
Learning without self-trust
Confidence requirements before 1.0.0:
- At least one complete runner adapter has been used successfully on real workflows.
- Stigmem-backed memory has soaked on real projects.
- Persisted schema migrations have been exercised.
- Security and redaction behavior has been tested under realistic agent runs.
- Documentation is complete enough for external users without maintainer hand-holding.
- Community contribution and support expectations are clear.
- Known limitations are documented honestly.
Executable workstreams
Each workstream below becomes one or more GitHub milestones/issues. Documentation requirements are part of the definition of done.
0 · Project foundation
Scope: package metadata · Python 3.12+ skeleton · craik CLI · MIT license · governance files · dependency lock strategy · CI quality gates · package-name reservation or publication.
Validation: craik --version works · tests run in CI · lint/type checks run in CI · package metadata validates.
Docs: installation · quickstart stub · contribution guide updates · release/support note · limitations note for pre-0.1.0.
1 · Runtime contracts
Scope: task request · project profile · policy envelope · capability grant · capability receipt · case file · agent role · worker result · handoff · memory proposal · memory backend capabilities · contradiction report · work graph event · evidence reference · assumption · delegation point · intent lock · instruction distillation item · quality gate result · artifact classification.
Validation: schema fixtures · invalid fixture tests · JSON serialization tests · version field tests.
Docs: schema reference · examples for each contract · versioning and migration policy.
2 · Local state and project registry
Scope: ~/.craik default home · CRAIK_HOME override · config/ · secrets/ · state/ · cache/ · logs/ · receipts/ · handoffs/ · case-files/ · projects/ · secure permissions where supported · SQLite store · project registry · immutable path config · project-local .craik/ opt-in only.
Validation: path resolver tests · permission tests · registry persistence tests · project-local opt-in tests.
Docs: configuring Craik home · local state layout reference · secrets handling guide.
3 · Policy, grants, redaction, receipts
Scope: strict / trusted-local / automation profiles · fail-open profile visibility · capability grants · immutable path protection · central redaction utility · shell/file/GitHub/memory grant enforcement · receipt persistence · policy denial receipts.
Validation: policy fixture tests · redaction tests · immutable-path tests · fail-open receipt tests · automation fail-closed tests.
Docs: policy profiles reference · fail-open guide · capability grants guide · redaction and secrets docs.
4 · Case files, intent, evidence, assumptions
Scope: task intent lock · repository state ingestion · docs and ADR discovery · default discovery exclusions for generated/dependency/build/cache/archive-heavy paths · project and user override rules · visible context-debt metadata · Stigmem/local fact loading · GitHub context placeholders · evidence references · assumption ledger · context budget metadata · stale-risk markers · context explanations · structured context requests · first-class unknowns · context debt tracking.
Validation: deterministic fixture output · evidence reference tests · assumption promotion tests · context inclusion/exclusion tests · default exclusion tests · override tests · stale-risk tests.
Docs: case file concept doc · using case files guide · evidence and assumptions guide · context budgeting guide · context discovery and exclusion guide.
5 · Handoffs, self-audit, exit discipline
Scope: structured handoff · Markdown handoff · self-audit before handoff · incomplete-run handoff · handoff quality score · unresolved questions · next steps · receipt links · memory proposal links · context debt links.
Validation: handoff schema tests · self-audit checklist tests · quality score fixture tests · interrupted-run fixture tests.
Docs: handoff concept doc · writing handoffs guide · self-audit reference · recovery and incomplete-run guide.
6 · Memory backends and Stigmem integration
Scope: ephemeral backend · local backend · Stigmem backend · capability detection · health and metadata checks · fact query/list/get/write · provenance reads · optional recall · optional conflicts · local proposal model · memory diff · memory impact preview · source identity handling · source attestation handling · error mapping.
Validation: backend interface tests · local backend persistence tests · Stigmem integration tests against a local node · auth failure tests · optional-capability fallback tests · memory diff tests.
Docs: memory backend reference · connecting Stigmem guide · Stigmem compatibility matrix · memory proposal and promotion guide · memory impact preview guide.
7 · GitHub adapter and demo workflow
Scope: GitHub auth detection · repository metadata · issues · PRs · changed files · check status · guarded GitHub comments/issues/PR creation · first Stigmem docs reconciliation demo.
Validation: mocked GitHub adapter tests · read-only fallback tests · permission failure tests · fixture demo run.
Docs: GitHub adapter guide · first Stigmem reconciliation demo · public/internal boundary guidance · troubleshooting guide.
8 · Work graph, contradictions, delegation
Scope: graph nodes and edges · task/handoff/fact/proposal/receipt/evidence/assumption/delegation/artifact nodes · contradiction reports · Stigmem conflict linking · local contradiction reports · human delegation points · approval/clarification/policy-override/memory-promotion/release-signoff requests.
Validation: graph export tests · contradiction lifecycle tests · delegation lifecycle tests · unresolved delegation block tests.
Docs: work graph concept doc · contradiction inbox guide · human delegation guide · graph export reference.
9 · Agent-native onboarding
Scope: craik onboard --project <project-id> · project model · active policies · ADRs and immutable paths · docs boundaries · recent handoffs · unresolved contradictions · stale-risk warnings · validation commands · Stigmem status · known traps · allowed next actions.
Validation: onboarding fixture tests · missing context tests · stale context tests · runner-readable output tests.
Docs: onboarding guide · known traps guide · project model concept doc.
10 · Runner adapters
Scope: runner adapter interface · Codex adapter · Claude adapter · Gemini adapter · runner capability matrix · policy-aware prompt compiler · runner metadata · normalized worker results · normalized handoffs · real-runner contract tests · runner trust profiles.
Validation: adapter interface tests · fixture contract tests · prompt compilation tests · runner capability matrix tests · real-runner smoke tests when credentials/tools are available.
Docs: runner adapter contract · Codex adapter guide · Claude adapter guide · Gemini adapter guide · prompt compiler reference · runner capability matrix reference.
11 · Single-agent execution loop
Scope: run id and run status model · task run state machine · plan/act/observe/evaluate/continue/stop phases · runner step contract · bounded case-file context with default exclusions and overrides · max-iteration limit · timeout and budget limits · intent-lock stop-condition enforcement · approval and grant checks before side effects · step receipts · observed output capture · memory proposal hooks · handoff on completion/block/failure/interruption · run resume · run recovery · agent exit discipline.
Validation: state-machine transition tests · max-iteration and timeout tests · budget exhaustion tests · stop-condition enforcement tests · approval-block tests · receipt-per-step tests · interrupted-run resume tests · handoff-on-failure tests · runner fixture tests · polluted-context fixture tests.
Docs: single-agent execution loop concept doc · running tasks guide · run state reference · resume and recovery guide · loop policy guide · context discovery override guide.
12 · Multi-agent coordination
Scope: orchestrator · specialist tasks · parallel read-only investigations · implementer/verifier/adversarial-reviewer/policy-reviewer/docs-reviewer/memory-curator/release-reviewer/adjudicator roles · typed worker results · cross-agent review protocol · structured agent debate · scope-change protocol.
Validation: child task graph tests · typed worker result tests · debate/adjudication fixture tests · unresolved-contradiction block tests · scope-change proposal tests.
Docs: multi-agent workflows guide · role reference · review protocol guide · structured debate guide.
13 · Runtime instruction distillation
Scope: declared instruction source registry · AGENTS.md · CLAUDE.md · GEMINI.md · HERMES.md · SKILLS.md · .cursorrules · .github/copilot-instructions.md · .codex/instructions.md · source hash tracking · line/range provenance · extraction categories · distillation proposals · stale distillation invalidation · instruction contradiction reports · promotion approval.
Validation: Markdown fixture tests · source hash invalidation tests · extraction category tests · contradiction fixture tests · approval/promotion tests.
Docs: instruction distillation concept doc · declaring instruction sources guide · distillation review guide · instruction categories reference.
14 · Quality gates and freshness
Scope: runtime critic · red team mode · evidence coverage score · tool result attestation · knowledge freshness probes · evidence expiration rules · negative knowledge · runtime memory hygiene · decision record suggestions · learning without self-trust.
Validation: critic fixture tests · red team policy tests · evidence coverage tests · tool-result source tests · freshness probe tests · memory hygiene proposal tests.
Docs: quality gates guide · freshness and staleness guide · negative knowledge guide · memory hygiene guide · decision record suggestion guide.
15 · Budgets, quotas, and operational bounds
Scope: context token budgets · model spend budgets · wall-clock budgets · shell command count · GitHub write count · memory write count · parallel worker count · retry count · approval count · budget receipts · budget escalation/block behavior.
Validation: budget accounting tests · exhaustion behavior tests · fail-open budget receipt tests · policy profile budget tests.
Docs: budget and quota guide · policy budget reference · troubleshooting budget exhaustion.
16 · Recovery and continuity
Scope: recovery mode · partial receipt loading · scratchpad restore · changed file detection · unfinished handoff recovery · unresolved delegation restore · "what changed since last time" deltas · run delta summaries.
Validation: interrupted-run fixtures · recovery command tests · delta calculation tests · partial handoff tests.
Docs: recovery guide · run deltas guide · interruption handling reference.
17 · Artifact and documentation intelligence
Scope: work product classification · provenance-aware documentation · public/internal boundary classifier · generated doc evidence links · docs stale-state detection · release note classification · audit artifact classification.
Validation: classifier fixture tests · public/internal boundary tests · provenance link tests · stale doc fixture tests.
Docs: artifact classification reference · provenance-aware docs guide · public/internal boundary guide · docs maintenance guide.
18 · Skills, plugins, and community ecosystem
Scope: skill package format · project-scoped skills · global skills · community skills layout · plugin descriptor format · probationary plugin policy · plugin capability grants · plugin receipts · adapter package guidance · reference integrations · marketplace/index format decision.
Validation: skill loader tests · plugin descriptor validation tests · probationary policy tests · plugin receipt tests · community package fixture tests.
Docs: skills concept doc · writing skills guide · community skills guide · plugin contract reference · writing plugins guide · plugin security guide · marketplace/index guide.
19 · Operator experience
Scope: TUI/dashboard decision · work graph explorer · handoff viewer · receipt viewer · contradiction inbox · evidence and assumption views · delegation queue · budget view · instruction distillation view · quality gate view · memory impact preview · known traps view · run delta view.
Validation: UI/TUI smoke tests · nonblank rendering checks · fixture state rendering tests · accessibility and keyboard navigation checks for UI surfaces.
Docs: operator guide · dashboard/TUI guide · view reference · troubleshooting guide.
20 · Operator integrations and always-on gateway
Scope: gateway daemon mode · setup wizard · diagnostics command · update guidance · channel adapter contract · first messaging channel adapter · inbound identity and pairing model · channel allowlists · channel-scoped policy envelopes · webhook ingress · scheduled automations · gateway receipts.
Validation: daemon lifecycle tests · setup wizard fixture tests · diagnostics failure-mode tests · webhook signature tests · channel identity mapping tests · scheduled task creation tests · gateway receipt tests · v0.8.0 gateway pipeline e2e test.
Docs: gateway guide · setup guide · diagnostics guide · channel adapter reference · webhook reference · scheduler guide · gateway security guide.
21 · Persistent agent runtime, providers, and sandboxes
Scope: persistent Craik agent runtime · craik / craik run launch UX · agent lifecycle commands · agent session state contract · agent session event contract · provider authentication flow for OpenAI, Anthropic, Gemini, and local models · guided provider setup UX · provider-backed agent sessions · interactive prompt loop · run / agent boundary decision · model provider registry · provider switching UX · provider failover policy · provider budget and quota links · Gemini provider/runtime path · local model routing · local model presets · provider certification matrix · failure recovery · persistent-agent security model · end-to-end launch demo · MCP client integration · MCP server/export decision · sandbox backend contract · local process backend · Docker sandbox backend · SSH or remote shell backend · browser/tool execution boundary · environment capability receipts.
Tracking: no v0.9.0 roadmap tile is silently untracked.
- Core session/runtime scope: #736, #739, #741, #761.
- Provider setup, Gemini, local routes, routing metadata, certification, budgets, and failover policy: #737, #738, #740, #742.
- Failure recovery, launch demo, and security hardening: #743, #744, #745, #759, #760, #761.
- MCP, sandbox backend, browser boundary, sandbox policy, and environment receipt backfills: #765, #766, #767, #768, #769, #770, #771, #772, #773.
Validation: agent launch tests · lifecycle command tests · provider authentication tests · provider session persistence tests · agent event persistence tests · OpenAI/Anthropic/Gemini/local model routing tests · guided setup tests · local model preset tests · interactive prompt loop tests · receipt and handoff linkage tests · interruption and exit behavior tests · failure recovery tests · provider certification matrix checks · provider registry tests · provider failover tests · MCP compatibility fixture tests · sandbox policy tests · backend isolation tests · environment receipt tests · budget linkage tests · end-to-end launch demo test.
Docs: persistent agent runtime guide · agent lifecycle reference · provider authentication guide · local model setup guide · provider certification matrix · provider routing guide · provider config reference · MCP integration guide · sandbox backend reference · persistent-agent security guide · execution environment security guide.
22 · Self-improving skills and learning loops
Scope: skill performance telemetry · autonomous skill proposal creation · skill improvement proposals · skill eval/replay harness · periodic memory review nudges · user/team preference facts · learning-loop receipts · approval gates for promoted skills · rollback path for bad skill updates · training/trajectory export format · trajectory compression or summarization.
Validation: skill proposal tests · skill eval fixture tests · replay determinism tests · approval gate tests · rollback tests · trajectory export tests · learning-loop receipt tests.
Docs: skill improvement guide · learning-loop policy guide · skill eval reference · trajectory export reference · rollback guide.
23 · Multimodal and companion surfaces
Scope: voice input/output posture · speech-to-text adapter contract · text-to-speech adapter contract · multimodal artifact references · desktop companion app decision · mobile companion app decision · live visual workspace/canvas decision · work graph to visual workspace bridge · accessibility requirements.
Validation: multimodal artifact schema tests · redaction tests for transcript and media metadata · accessibility checks for companion surfaces · visual workspace smoke tests where implemented · adapter contract tests.
Docs: multimodal posture doc · voice adapter reference · companion app security guide · visual workspace guide · accessibility checklist.
24 · Migration, i18n, and ecosystem compatibility
Scope: adjacent-tool import/migration assessment · multi-agent workflow import/migration assessment · import dry-run reports · memory/skill/config migration maps · secret migration policy · ecosystem compatibility guide · adjacent runtime bridge decision · multi-agent workflow bridge decision · locale/i18n framework · translated docs strategy.
Validation: import dry-run fixture tests · migration map tests · secret redaction tests · bridge compatibility smoke tests where implemented · locale fallback tests · translated docs link tests where applicable.
Docs: migration guide · import dry-run reference · secret migration policy · ecosystem compatibility guide · i18n guide · bridge decision records.
v0.1.0 issue cut
The initial issue set covers only the v0.1.0 gate and any contracts
needed to avoid rework.
- Scaffold Python package and
craikCLI. - Add core Pydantic schemas and fixtures.
- Implement
~/.craikpath resolver and local state layout. - Implement SQLite local store.
- Implement project registry.
- Implement strict / trusted-local / automation policy profiles.
- Implement capability grants and immutable path protection.
- Implement central redaction utility.
- Implement receipt store.
- Implement case file assembler with evidence, assumptions, and context budget metadata.
- Implement intent lock.
- Implement handoff writer and self-audit checklist.
- Implement local memory backend and proposal flow.
- Implement Stigmem backend minimum compatibility.
- Implement memory diff and memory impact preview foundations.
- Implement GitHub read adapter.
- Implement work graph export.
- Implement contradiction report model.
- Implement agent-native onboarding.
- Implement policy test harness and core policy tests.
- Implement Stigmem documentation reconciliation demo.
- Build initial docs tree and publish v0.1.0 user/concept/reference docs.
Each issue includes: implementation checklist · test/validation checklist · documentation checklist · security/policy impact · Stigmem fact update requirement when relevant.
Goal issue workflow
For each 0.x.0 goal issue, implementation is not complete until the
change has a pull request and the pull request has passed required CI.
The working branch is pushed after implementation, tests, docs, and
local validation are complete. Then the PR checks are watched to a
terminal state. If any required check fails, fix the failure in the same
PR branch, push again, and repeat the check wait before closing the issue
or marking the goal complete.
Do not close milestone issues from local validation alone. The goal workflow requires the PR branch to be current on GitHub and the required PR gates to be green. If the agent opened the PR and the checks are clean, the agent merges the PR, verifies the merge landed on the base branch, prunes stale local and remote branches, and only then moves to the next goal.
Documentation definition of done
What concept changed?
What user workflow changed?
What CLI / API / config changed?
What policy or security behavior changed?
What examples should exist?
What limitations apply?
What facts should future agents know?
For implementation issues, docs are updated in the same PR unless the issue is explicitly internal-only scaffolding.
Release definition of done
Passing tests
Passing lint / type checks
Generated or updated CLI/reference docs
Updated roadmap state
Updated limitations
Security notes
Migration notes
When local state or schemas change.
Runnable demo status
Memory update
Optional release-state memory only when Stigmem is available; this is not a release gate.