Skip to main content
Version: MVP

Implementation Plan

14 min readFor contributorsUpdated 2026-05-19

What you'll find here

The buildable sequence that turns the Craik concept into a shipped runtime: stack, repository shape, milestones with build / commands / acceptance, the first end-to-end scenario, deferred decisions, and the decided project defaults.

CLI-first, contract-first, test-first.

Each milestone is a buildable slice with concrete commands and acceptance criteria. A UI waits until the CLI workflow proves the runtime contracts are correct.

Accepted stack

Choice
Why
Notes
Python 3.12+
core
Stigmem already has Python surfaces; keeps Craik close to common agent-runtime conventions.
Typer
CLI
Type-driven CLI with minimal boilerplate.
Pydantic
contracts
Makes versioned contracts straightforward.
SQLite
state
Enough for local task, receipt, handoff, and work-graph state.
stdlib HTTP
first calls
For initial Stigmem and GitHub API calls.
pytest · ruff · mypy
quality
Test and quality gates from day one.

Dependency posture.

Favor reproducibility. Exact pins for runtime dependencies once implementation begins; lockfile updates reviewed intentionally. Optional provider, browser, UI, or adapter dependencies stay as extras rather than core dependencies.

Repository shape

Initial structure:

craik/
__init__.py
cli.py
contracts/
task.py
project.py
policy.py
case_file.py
receipt.py
handoff.py
memory.py
graph.py
evidence.py
assumption.py
delegation.py
intent.py
instruction.py
quality.py
artifact.py
runtime/
project_registry.py
paths.py
case_assembler.py
executor.py
handoff_writer.py
receipt_store.py
policy_engine.py
budget.py
prompt_compiler.py
critic.py
distiller.py
memory/
base.py
ephemeral.py
local.py
stigmem.py
adapters/
repo.py
github.py
runners/
base.py
codex.py
claude.py
gemini.py
orchestration/
roles.py
orchestrator.py
worker_result.py
graph/
store.py
export.py
tests/
docs/

Milestone 1 · Contract foundation

Package skeleton

CLI skeleton

Pydantic models for core contracts

Schema version fields

JSON serialization / deserialization

Validation tests

Commands: craik version · craik schema list · craik schema show <name>

Acceptance:

  1. All schemas validate sample fixtures.
  2. Invalid fixtures produce clear errors.
  3. Docs examples match real schemas.
  4. CI runs lint, type, and test checks.

Milestone 2 · Project registry and local state

Craik home path resolver

CRAIK_HOME override

Default ~/.craik home

Structured home subdirectories

Secure permissions for home and secrets

SQLite local store

Project registry

Repo detection

Immutable path config

Local memory backend

Commands: craik init · craik project add <path> · craik project list · craik project show <id>

Acceptance:

  1. Default home resolves to ~/.craik.
  2. CRAIK_HOME overrides the default.
  3. config, secrets, state, cache, logs, receipts, handoffs, case-files, and projects directories are created as needed.
  4. Secrets files are owner-readable/writable where supported.
  5. Project-local .craik/ is created only by explicit opt-in.
  6. Project can be registered from a Git repo.
  7. Registry persists between commands.
  8. Project profile validates.
  9. Immutable path policy is stored.

Milestone 3 · Case-file assembly

Repository adapter

Branch / diff inspection

Docs discovery

Default repository-context exclusions

For generated, dependency, build, cache, archive-heavy paths.

Project + user override rules

For discovery include / exclude behavior.

ADR / policy discovery

Stigmem / local fact loading

Stale-risk section

Evidence references

Assumption ledger

Context budget accounting

Context debt

For paths skipped by defaults, overrides, or budget pressure.

Verification-plan section

Markdown + JSON output

Commands: craik task create --project <id> --title "..." · craik case build <task-id> · craik case show <task-id>

Acceptance:

  1. Case file includes repo status.
  2. Docs and immutable paths are labeled.
  3. Generated and dependency paths are excluded by default unless explicitly included.
  4. Project and user overrides can extend, replace, or explicitly include paths from the default discovery rules.
  5. Excluded paths are visible in case-file context metadata rather than silently disappearing.
  6. Facts include source / confidence.
  7. Unsupported conclusions are tracked as assumptions.
  8. Included and omitted context is explainable.
  9. Output is deterministic for fixtures.
  10. Missing context is clearly reported.

Milestone 4 · Policy and receipts

Policy envelope generation

Strict / trusted-local / automation profiles

Explicit fail-open profile handling

Capability grant model

Immutable path protection

Central redaction utility

Receipt store

Policy denial receipts

Shell-command receipt wrapper

File-write receipt wrapper

Commands: craik policy show <task-id> · craik receipts list <task-id> · craik receipts show <receipt-id>

Acceptance:

  1. Strict mode is default.
  2. Fail-open is available only through named policy profiles.
  3. Denied writes are blocked.
  4. Allowed actions create receipts.
  5. Fail-open decisions create receipts.
  6. Receipts are redacted before persistence.
  7. Shell-command results are summarized.
  8. Receipts link back to task and policy envelope.

Milestone 5 · Handoff loop

Structured handoff writer

Markdown handoff writer

Handoff validation

Handoff load into case file

Memory proposal attachment

Commands: craik handoff create <task-id> · craik handoff show <task-id> · craik handoff list --project <id>

Acceptance:

  1. Every completed task can produce a handoff.
  2. Handoff includes receipts.
  3. Handoff includes verification state.
  4. Handoff is loaded into the next related case file.
  5. Handoff schema validates.

Milestone 6 · Stigmem backend

Stigmem config

API key setup

Backend capability detection

Fact search / read / write

Fact proposal mapping

Provenance reads

Optional recall support

Optional conflict support

Handoff summary fact writes

Memory diff for task runs

Commands: craik connect stigmem --url <url> · craik memory search <query> · craik memory propose <task-id> · craik memory diff <task-id>

Acceptance:

  1. Craik can connect to a local Stigmem node.
  2. Backend verifies GET /healthz and GET /.well-known/stigmem.
  3. Failed auth has a clear error.
  4. Facts are included in case files.
  5. Direct fact writes require memory-write grant.
  6. Proposed facts can be reviewed before write.
  7. Written facts include provenance.
  8. Optional recall / conflict capabilities are detected without becoming required.
  9. Craik falls back to local proposals, local contradiction reports, and local memory diffs when optional Stigmem capabilities are unavailable.

Milestone 7 · GitHub adapter

GitHub auth detection

Repo mapping

Issue / PR reads

Changed-file reads

Check-status reads

Guarded issue / PR / comment writes

Commands: craik github status <project-id> · craik github issues <project-id> · craik github prs <project-id>

Acceptance:

  1. Case file includes relevant GitHub state.
  2. GitHub writes require grants.
  3. Created links appear in handoff.
  4. Unauthenticated mode remains usable for local-only tasks.

Milestone 8 · Work graph

Graph node / event models

Graph store

Export command

Task / handoff / fact / receipt graph links

Contradiction graph links

Human delegation-point nodes

Commands: craik graph export <project-id> · craik graph show-task <task-id>

Acceptance:

  1. Each task creates graph nodes.
  2. Handoffs and receipts are linked.
  3. Fact proposals link to evidence.
  4. Graph export is deterministic.

Milestone 8a · Evidence, assumptions, and onboarding

Evidence reference model

Assumption ledger model

Belief promotion lifecycle model

Onboarding command

Context budgeting metadata

Provenance-aware documentation links

Decision-record suggestion hooks

Commands: craik onboard --project <project-id> · craik assumptions list <task-id> · craik evidence show <task-id>

Acceptance:

  1. Case files explain included and omitted context.
  2. Assumptions are separate from facts.
  3. Memory proposals require evidence before promotion.
  4. Onboarding output includes policies, recent handoffs, contradictions, stale-risk warnings, and allowed next actions.
  5. Decision-record suggestions are generated only as proposals.

Milestone 8b · Policy tests, delegation, and budgets

Policy fixture test harness

Required policy regression tests

Human delegation-point model

Budget and quota model

Budget receipts

Budget enforcement hooks

Commands: craik policy test · craik delegations list · craik budgets show <task-id>

Acceptance:

  1. Policy tests cover immutable paths, memory-proposal defaults, fail-open receipts, automation fail-closed behavior, runner grant boundaries, and redaction.
  2. Unresolved delegation points appear in handoffs.
  3. Resolved delegation points create receipts.
  4. Budgets appear in case files and receipts.
  5. Budget exhaustion blocks or escalates according to policy.

Milestone 8c · Instruction distillation and runtime quality

Runtime instruction source registry

Receipted registration records declared sources before ingestion or promotion.

Markdown instruction distiller

Parses registered instruction files into deterministic statement candidates without store writes.

Source hash tracking

Refreshes active registered sources from real repo files, stores newline-normalized SHA-256 snapshots, and marks changed or missing sources for stale invalidation.

Line/range provenance

Persists one deterministic provenance record per parsed statement with source snapshot, line and column range, summary, and excerpt hash.

Distillation proposal store

Categorizes provenanced statements with deterministic rules and writes reviewable proposals while surfacing unclassified candidates.

Inter-source contradictions

Normalizes proposal triples across sources and opens contradiction reports for diverging policy, boundary, command, instruction, and security-rule guidance.

Operator approval flow

Requires explicit operator approval or denial receipts before proposals become governing runtime constraints.

Override review hardening

Requires explicit override rationale when stale or contradicted proposals are approved through either promotion path.

Case-file distillation evidence

Loads governing distillations into case files with category, source, provenance ranges, and approval receipt snapshots.

Prompt distillation section

Renders governing distillations in compiled prompts as a separate ordered authoritative section with stale-exclusion warnings.

Instruction distillation CLI

Exposes source registration, filtered proposal listing, approve/reject decisions, and provenance-aware item inspection.

Task intent lock

Expiring scratchpad

Scope-change proposal model

Self-audit checklist

Runtime critic

Evidence coverage score

Handoff quality score

Tool-result attestation

Memory-impact preview

Commands: craik instructions scan <project-id> · craik instructions distill <project-id> · craik intent show <task-id> · craik scratchpad list <task-id> · craik quality check <task-id> · craik memory preview <task-id>

Acceptance:

  1. Instruction distillation uses declared sources only.
  2. Every extracted statement has stable provenance linked to the source snapshot.
  3. Categorized distillation proposals are deterministic and report unclassified candidates.
  4. Inter-source conflicts surface as contradiction reports and defer conflicted proposals.
  5. Only explicitly approved governing instructions are visible to runtime consumers.
  6. Stale or contradicted approvals require recorded override rationale.
  7. Case files carry governing distillations as first-class evidence with provenance.
  8. Compiled prompts include governing distillations as a separate authoritative section.
  9. Operators can drive the distillation lifecycle through craik instructions.
  10. Source-hash changes invalidate stale distillations.
  11. Extracted instruction facts remain proposals until approved.
  12. Intent lock is included in case files and handoffs.
  13. Scratchpad does not become durable memory without promotion.
  14. Quality gates flag unsupported claims and missing validation.
  15. Memory-impact preview appears before direct Stigmem writes.

Milestone 8d · Runner intelligence and continuity

Runner capability matrix

Agent workload memory

Known traps registry

Evidence expiration rules

Knowledge-freshness probes

Policy-aware prompt compiler

Real-runner contract test harness

Work-product classification

"What changed since last time" deltas

Recovery mode

Red-team mode

Commands: craik runners matrix · craik traps list <project-id> · craik freshness probe <task-id> · craik prompt compile <task-id> --runner <runner> · craik recover <task-id> · craik delta <task-id>

Acceptance:

  1. Case files include known traps and freshness state.
  2. Prompt compilation uses policy, context contracts, runner capabilities, and output schemas.
  3. Recovery mode can resume from partial receipts, scratchpad, changed files, and unfinished handoff.
  4. Real-runner contract tests validate adapter output shape.
  5. Red-team mode can be required by policy.
  6. Task starts can show relevant deltas since the last related run.

Milestone 9 · Contradictions and memory diff

Contradiction report model

Contradiction store

Contradiction list / show / resolve

Memory diff command

Resolution-to-memory-proposal flow

Commands: craik contradictions list · craik contradictions show <id> · craik contradictions resolve <id> · craik memory diff <task-id>

Acceptance:

  1. Contradictions can be opened by an agent or user.
  2. Contradictions are not overwritten by later facts.
  3. Resolutions record rationale.
  4. Memory diff includes contradiction state.

Milestone 10 · Single-agent execution loop

Build after runner contracts and the durable single-agent state model are stable.

Task-run state machine

Run id and status model

Plan / act / observe / evaluate / continue / stop phases

Runner step contract

Max-iteration limit

Timeout and budget checks

Intent-lock stop-condition enforcement

Approval and grant checks before side effects

Step receipts

Observed-output capture

Memory proposal hooks

Handoff on completion / block / failure / interruption

Run resume

Run recovery

Agent exit discipline

Commands: craik task run <task-id> --runner <runner> · craik runs show <run-id> · craik runs resume <run-id>

Acceptance:

  1. Craik, not the chat transcript, owns the loop boundary.
  2. Every side-effecting step checks policy before execution.
  3. Each important step can produce a receipt.
  4. Stop conditions halt the run before scope drift.
  5. Iteration, budget, and timeout limits are enforced.
  6. Interrupted runs can resume from persisted state.
  7. Blocked or failed runs produce handoffs.
  8. Memory updates remain proposals unless policy grants direct writes.

Milestone 11 · Multi-agent orchestration

Build after the single-agent durable loop works.

Role manifests

Orchestrator task decomposition

Child task creation

Worker-result validation

Specialist handoffs

Parent handoff merge

Parallel read-only execution

Commands: craik roles list · craik task split <task-id> · craik task run --multi-agent <task-id>

Acceptance:

  1. Specialists receive scoped case files.
  2. Worker results validate.
  3. Child handoffs link to parent.
  4. Read-only work can run in parallel.
  5. Unresolved contradictions block flattening into a final answer.

Milestone 12 · First-class runner adapters

Runner adapter interface

Codex adapter

Claude adapter

Gemini adapter

Runner metadata capture

Worker-result normalization

Handoff normalization

Memory-proposal normalization

Failure / block reporting

Commands: craik runners list · craik runners inspect <runner> · craik task run --runner codex <task-id> · craik task run --runner claude <task-id> · craik task run --runner gemini <task-id>

Acceptance:

  1. Codex, Claude, and Gemini adapters implement the same interface.
  2. Each adapter consumes case files and policy envelopes.
  3. Each adapter emits typed worker results or clear block / failure states.
  4. Adapter outputs can create handoffs and receipts.
  5. Runner-specific metadata is preserved without polluting core contracts.
  6. Adjacent runtime bridges remain future integrations rather than required execution layers.

Milestone 13 · Skills and probationary plugins

Skill directory discovery

Project-scoped skills

Global skills

Context-contract declarations

Plugin descriptor model

Probationary plugin policy

Plugin receipt requirements

Acceptance:

  1. Skills alter case-file guidance without changing code.
  2. Project skills override global skills.
  3. Plugins expose typed capabilities.
  4. Probationary plugins have limited grants.
  5. Plugin use appears in receipts.

First end-to-end scenario

The MVP target scenario — automated as a fixture-driven integration test before broadening the platform.

  1. Register eidetic-labs/stigmem as the first demo project.
  2. Connect to local Stigmem.
  3. Create a docs-reconciliation task.
  4. Build a case file from repo docs, ADRs, facts, and GitHub state.
  5. Run a governed agent with docs-write capability.
  6. Capture receipts for file writes and validation commands.
  7. Generate a handoff.
  8. Propose or write facts about the new state.
  9. Export work graph for the task.

The scenario explicitly validates:

  1. ADRs are treated as immutable inputs.
  2. Public docs do not receive internal-only labels or implementation tracking terms.
  3. Stale docs are identified with evidence.
  4. Stigmem facts are used as context with provenance.
  5. Memory writes are proposed or written according to policy.
  6. The final handoff can seed a follow-up task.

Deferred decisions

These should be decided before coding starts, but they should not block the planning docs.

Hosted service posture

Relationship to existing Eidetic auth

Whether the first UI is built into Craik or kept separate

Decided project defaults

Decision
Value
Notes
License
MIT
Code reuse only — governance, contribution terms, and trademark covered separately.
Public repository
GitHub
eidetic-labs/craik
Product framing
framing
Durable agent runtime.
Reference memory substrate
substrate
Stigmem.
Min Stigmem compatibility
required
Health · well-known metadata · authenticated fact read/write/query · fact provenance · scopes · confidence · source fields.
Initial interface
surface
CLI-first.
First demo target
scenario
Stigmem documentation and state reconciliation.
Initial first-class runners
runners
Codex · Claude · Gemini.
Adjacent runtime relationship
posture
Design reference and possible future bridge — not a required dependency.
Differentiator objective
posture
Evidence-first, assumption-aware, policy-tested, budgeted, human-delegable agent work.
Core language
language
Python 3.12+.
PyPI distribution
package
craik
Python module
module
craik
CLI command
command
craik
Future npm package
reserved
craik (if needed).
Default local home
path
~/.craik
Local home override
env
CRAIK_HOME
Project-local metadata
posture
Opt-in only.
Default policy profile
policy
Strict.
Fail-open behavior
policy
Allowed only through explicit named policy profiles.
Trusted-local profile
policy
Opt-in fail-open with mandatory receipts.
Automation profile
policy
Fail-closed.
Memory writes
policy
Proposals by default.
Secrets storage
policy
~/.craik/secrets/ or env vars · redacted before persistence.
Initial CLI framework
tool
Typer.
Contract validation
tool
Pydantic.
Local state
tool
SQLite.
API client
tool
httpx.
Test & quality gates
tool
pytest · ruff · mypy.

Name availability snapshot.

Live registry checks on 2026-05-15 returned 404 for both https://pypi.org/pypi/craik/json and https://registry.npmjs.org/craik — the names appeared available at that time. Publish early once package metadata is ready. If the plain distribution name is lost before publication, fall back to craik-runtime while preserving craik for the module and CLI command.

Contribution and trademark follow-up

MIT governs code reuse but not project governance, contribution terms, or trademark rights. Initial lightweight governance lives in root-level policy files.

Standard
File
Notes
Contribution guide
file
CONTRIBUTING.md
Contribution certification
policy
DCO, not CLA.
Code of conduct
policy
Contributor Covenant 2.1 baseline.
Security disclosure
file
Private report path in SECURITY.md.
Trademark guidance
file
TRADEMARKS.md
Maintainer & release policy
file
MAINTAINERS.md

Before broad external contribution, revisit: final security contact · DCO enforcement automation · release automation · package publishing ownership · whether a dedicated governance document is needed after 0.1.0.

What's next