Implementation Plan
What you'll find here
The buildable sequence that turns the Craik concept into a shipped runtime: stack, repository shape, milestones with build / commands / acceptance, the first end-to-end scenario, deferred decisions, and the decided project defaults.
CLI-first, contract-first, test-first.
Each milestone is a buildable slice with concrete commands and acceptance criteria. A UI waits until the CLI workflow proves the runtime contracts are correct.
Accepted stack
pytest · ruff · mypyDependency posture.
Favor reproducibility. Exact pins for runtime dependencies once implementation begins; lockfile updates reviewed intentionally. Optional provider, browser, UI, or adapter dependencies stay as extras rather than core dependencies.
Repository shape
Initial structure:
craik/
__init__.py
cli.py
contracts/
task.py
project.py
policy.py
case_file.py
receipt.py
handoff.py
memory.py
graph.py
evidence.py
assumption.py
delegation.py
intent.py
instruction.py
quality.py
artifact.py
runtime/
project_registry.py
paths.py
case_assembler.py
executor.py
handoff_writer.py
receipt_store.py
policy_engine.py
budget.py
prompt_compiler.py
critic.py
distiller.py
memory/
base.py
ephemeral.py
local.py
stigmem.py
adapters/
repo.py
github.py
runners/
base.py
codex.py
claude.py
gemini.py
orchestration/
roles.py
orchestrator.py
worker_result.py
graph/
store.py
export.py
tests/
docs/
Milestone 1 · Contract foundation
Package skeleton
CLI skeleton
Pydantic models for core contracts
Schema version fields
JSON serialization / deserialization
Validation tests
Commands: craik version · craik schema list · craik schema show <name>
Acceptance:
- All schemas validate sample fixtures.
- Invalid fixtures produce clear errors.
- Docs examples match real schemas.
- CI runs lint, type, and test checks.
Milestone 2 · Project registry and local state
Craik home path resolver
CRAIK_HOME override
Default ~/.craik home
Structured home subdirectories
Secure permissions for home and secrets
SQLite local store
Project registry
Repo detection
Immutable path config
Local memory backend
Commands: craik init · craik project add <path> · craik project list · craik project show <id>
Acceptance:
- Default home resolves to
~/.craik. CRAIK_HOMEoverrides the default.config,secrets,state,cache,logs,receipts,handoffs,case-files, andprojectsdirectories are created as needed.- Secrets files are owner-readable/writable where supported.
- Project-local
.craik/is created only by explicit opt-in. - Project can be registered from a Git repo.
- Registry persists between commands.
- Project profile validates.
- Immutable path policy is stored.
Milestone 3 · Case-file assembly
Repository adapter
Branch / diff inspection
Docs discovery
Default repository-context exclusions
For generated, dependency, build, cache, archive-heavy paths.
Project + user override rules
For discovery include / exclude behavior.
ADR / policy discovery
Stigmem / local fact loading
Stale-risk section
Evidence references
Assumption ledger
Context budget accounting
Context debt
For paths skipped by defaults, overrides, or budget pressure.
Verification-plan section
Markdown + JSON output
Commands: craik task create --project <id> --title "..." · craik case build <task-id> · craik case show <task-id>
Acceptance:
- Case file includes repo status.
- Docs and immutable paths are labeled.
- Generated and dependency paths are excluded by default unless explicitly included.
- Project and user overrides can extend, replace, or explicitly include paths from the default discovery rules.
- Excluded paths are visible in case-file context metadata rather than silently disappearing.
- Facts include source / confidence.
- Unsupported conclusions are tracked as assumptions.
- Included and omitted context is explainable.
- Output is deterministic for fixtures.
- Missing context is clearly reported.
Milestone 4 · Policy and receipts
Policy envelope generation
Strict / trusted-local / automation profiles
Explicit fail-open profile handling
Capability grant model
Immutable path protection
Central redaction utility
Receipt store
Policy denial receipts
Shell-command receipt wrapper
File-write receipt wrapper
Commands: craik policy show <task-id> · craik receipts list <task-id> · craik receipts show <receipt-id>
Acceptance:
- Strict mode is default.
- Fail-open is available only through named policy profiles.
- Denied writes are blocked.
- Allowed actions create receipts.
- Fail-open decisions create receipts.
- Receipts are redacted before persistence.
- Shell-command results are summarized.
- Receipts link back to task and policy envelope.
Milestone 5 · Handoff loop
Structured handoff writer
Markdown handoff writer
Handoff validation
Handoff load into case file
Memory proposal attachment
Commands: craik handoff create <task-id> · craik handoff show <task-id> · craik handoff list --project <id>
Acceptance:
- Every completed task can produce a handoff.
- Handoff includes receipts.
- Handoff includes verification state.
- Handoff is loaded into the next related case file.
- Handoff schema validates.
Milestone 6 · Stigmem backend
Stigmem config
API key setup
Backend capability detection
Fact search / read / write
Fact proposal mapping
Provenance reads
Optional recall support
Optional conflict support
Handoff summary fact writes
Memory diff for task runs
Commands: craik connect stigmem --url <url> · craik memory search <query> · craik memory propose <task-id> · craik memory diff <task-id>
Acceptance:
- Craik can connect to a local Stigmem node.
- Backend verifies
GET /healthzandGET /.well-known/stigmem. - Failed auth has a clear error.
- Facts are included in case files.
- Direct fact writes require memory-write grant.
- Proposed facts can be reviewed before write.
- Written facts include provenance.
- Optional recall / conflict capabilities are detected without becoming required.
- Craik falls back to local proposals, local contradiction reports, and local memory diffs when optional Stigmem capabilities are unavailable.
Milestone 7 · GitHub adapter
GitHub auth detection
Repo mapping
Issue / PR reads
Changed-file reads
Check-status reads
Guarded issue / PR / comment writes
Commands: craik github status <project-id> · craik github issues <project-id> · craik github prs <project-id>
Acceptance:
- Case file includes relevant GitHub state.
- GitHub writes require grants.
- Created links appear in handoff.
- Unauthenticated mode remains usable for local-only tasks.
Milestone 8 · Work graph
Graph node / event models
Graph store
Export command
Task / handoff / fact / receipt graph links
Contradiction graph links
Human delegation-point nodes
Commands: craik graph export <project-id> · craik graph show-task <task-id>
Acceptance:
- Each task creates graph nodes.
- Handoffs and receipts are linked.
- Fact proposals link to evidence.
- Graph export is deterministic.
Milestone 8a · Evidence, assumptions, and onboarding
Evidence reference model
Assumption ledger model
Belief promotion lifecycle model
Onboarding command
Context budgeting metadata
Provenance-aware documentation links
Decision-record suggestion hooks
Commands: craik onboard --project <project-id> · craik assumptions list <task-id> · craik evidence show <task-id>
Acceptance:
- Case files explain included and omitted context.
- Assumptions are separate from facts.
- Memory proposals require evidence before promotion.
- Onboarding output includes policies, recent handoffs, contradictions, stale-risk warnings, and allowed next actions.
- Decision-record suggestions are generated only as proposals.
Milestone 8b · Policy tests, delegation, and budgets
Policy fixture test harness
Required policy regression tests
Human delegation-point model
Budget and quota model
Budget receipts
Budget enforcement hooks
Commands: craik policy test · craik delegations list · craik budgets show <task-id>
Acceptance:
- Policy tests cover immutable paths, memory-proposal defaults, fail-open receipts, automation fail-closed behavior, runner grant boundaries, and redaction.
- Unresolved delegation points appear in handoffs.
- Resolved delegation points create receipts.
- Budgets appear in case files and receipts.
- Budget exhaustion blocks or escalates according to policy.
Milestone 8c · Instruction distillation and runtime quality
Runtime instruction source registry
Receipted registration records declared sources before ingestion or promotion.
Markdown instruction distiller
Parses registered instruction files into deterministic statement candidates without store writes.
Source hash tracking
Refreshes active registered sources from real repo files, stores newline-normalized SHA-256 snapshots, and marks changed or missing sources for stale invalidation.
Line/range provenance
Persists one deterministic provenance record per parsed statement with source snapshot, line and column range, summary, and excerpt hash.
Distillation proposal store
Categorizes provenanced statements with deterministic rules and writes reviewable proposals while surfacing unclassified candidates.
Inter-source contradictions
Normalizes proposal triples across sources and opens contradiction reports for diverging policy, boundary, command, instruction, and security-rule guidance.
Operator approval flow
Requires explicit operator approval or denial receipts before proposals become governing runtime constraints.
Override review hardening
Requires explicit override rationale when stale or contradicted proposals are approved through either promotion path.
Case-file distillation evidence
Loads governing distillations into case files with category, source, provenance ranges, and approval receipt snapshots.
Prompt distillation section
Renders governing distillations in compiled prompts as a separate ordered authoritative section with stale-exclusion warnings.
Instruction distillation CLI
Exposes source registration, filtered proposal listing, approve/reject decisions, and provenance-aware item inspection.
Task intent lock
Expiring scratchpad
Scope-change proposal model
Self-audit checklist
Runtime critic
Evidence coverage score
Handoff quality score
Tool-result attestation
Memory-impact preview
Commands: craik instructions scan <project-id> · craik instructions distill <project-id> · craik intent show <task-id> · craik scratchpad list <task-id> · craik quality check <task-id> · craik memory preview <task-id>
Acceptance:
- Instruction distillation uses declared sources only.
- Every extracted statement has stable provenance linked to the source snapshot.
- Categorized distillation proposals are deterministic and report unclassified candidates.
- Inter-source conflicts surface as contradiction reports and defer conflicted proposals.
- Only explicitly approved governing instructions are visible to runtime consumers.
- Stale or contradicted approvals require recorded override rationale.
- Case files carry governing distillations as first-class evidence with provenance.
- Compiled prompts include governing distillations as a separate authoritative section.
- Operators can drive the distillation lifecycle through
craik instructions. - Source-hash changes invalidate stale distillations.
- Extracted instruction facts remain proposals until approved.
- Intent lock is included in case files and handoffs.
- Scratchpad does not become durable memory without promotion.
- Quality gates flag unsupported claims and missing validation.
- Memory-impact preview appears before direct Stigmem writes.
Milestone 8d · Runner intelligence and continuity
Runner capability matrix
Agent workload memory
Known traps registry
Evidence expiration rules
Knowledge-freshness probes
Policy-aware prompt compiler
Real-runner contract test harness
Work-product classification
"What changed since last time" deltas
Recovery mode
Red-team mode
Commands: craik runners matrix · craik traps list <project-id> · craik freshness probe <task-id> · craik prompt compile <task-id> --runner <runner> · craik recover <task-id> · craik delta <task-id>
Acceptance:
- Case files include known traps and freshness state.
- Prompt compilation uses policy, context contracts, runner capabilities, and output schemas.
- Recovery mode can resume from partial receipts, scratchpad, changed files, and unfinished handoff.
- Real-runner contract tests validate adapter output shape.
- Red-team mode can be required by policy.
- Task starts can show relevant deltas since the last related run.
Milestone 9 · Contradictions and memory diff
Contradiction report model
Contradiction store
Contradiction list / show / resolve
Memory diff command
Resolution-to-memory-proposal flow
Commands: craik contradictions list · craik contradictions show <id> · craik contradictions resolve <id> · craik memory diff <task-id>
Acceptance:
- Contradictions can be opened by an agent or user.
- Contradictions are not overwritten by later facts.
- Resolutions record rationale.
- Memory diff includes contradiction state.
Milestone 10 · Single-agent execution loop
Build after runner contracts and the durable single-agent state model are stable.
Task-run state machine
Run id and status model
Plan / act / observe / evaluate / continue / stop phases
Runner step contract
Max-iteration limit
Timeout and budget checks
Intent-lock stop-condition enforcement
Approval and grant checks before side effects
Step receipts
Observed-output capture
Memory proposal hooks
Handoff on completion / block / failure / interruption
Run resume
Run recovery
Agent exit discipline
Commands: craik task run <task-id> --runner <runner> · craik runs show <run-id> · craik runs resume <run-id>
Acceptance:
- Craik, not the chat transcript, owns the loop boundary.
- Every side-effecting step checks policy before execution.
- Each important step can produce a receipt.
- Stop conditions halt the run before scope drift.
- Iteration, budget, and timeout limits are enforced.
- Interrupted runs can resume from persisted state.
- Blocked or failed runs produce handoffs.
- Memory updates remain proposals unless policy grants direct writes.
Milestone 11 · Multi-agent orchestration
Build after the single-agent durable loop works.
Role manifests
Orchestrator task decomposition
Child task creation
Worker-result validation
Specialist handoffs
Parent handoff merge
Parallel read-only execution
Commands: craik roles list · craik task split <task-id> · craik task run --multi-agent <task-id>
Acceptance:
- Specialists receive scoped case files.
- Worker results validate.
- Child handoffs link to parent.
- Read-only work can run in parallel.
- Unresolved contradictions block flattening into a final answer.
Milestone 12 · First-class runner adapters
Runner adapter interface
Codex adapter
Claude adapter
Gemini adapter
Runner metadata capture
Worker-result normalization
Handoff normalization
Memory-proposal normalization
Failure / block reporting
Commands: craik runners list · craik runners inspect <runner> · craik task run --runner codex <task-id> · craik task run --runner claude <task-id> · craik task run --runner gemini <task-id>
Acceptance:
- Codex, Claude, and Gemini adapters implement the same interface.
- Each adapter consumes case files and policy envelopes.
- Each adapter emits typed worker results or clear block / failure states.
- Adapter outputs can create handoffs and receipts.
- Runner-specific metadata is preserved without polluting core contracts.
- Adjacent runtime bridges remain future integrations rather than required execution layers.
Milestone 13 · Skills and probationary plugins
Skill directory discovery
Project-scoped skills
Global skills
Context-contract declarations
Plugin descriptor model
Probationary plugin policy
Plugin receipt requirements
Acceptance:
- Skills alter case-file guidance without changing code.
- Project skills override global skills.
- Plugins expose typed capabilities.
- Probationary plugins have limited grants.
- Plugin use appears in receipts.
First end-to-end scenario
The MVP target scenario — automated as a fixture-driven integration test before broadening the platform.
- Register
eidetic-labs/stigmemas the first demo project. - Connect to local Stigmem.
- Create a docs-reconciliation task.
- Build a case file from repo docs, ADRs, facts, and GitHub state.
- Run a governed agent with docs-write capability.
- Capture receipts for file writes and validation commands.
- Generate a handoff.
- Propose or write facts about the new state.
- Export work graph for the task.
The scenario explicitly validates:
- ADRs are treated as immutable inputs.
- Public docs do not receive internal-only labels or implementation tracking terms.
- Stale docs are identified with evidence.
- Stigmem facts are used as context with provenance.
- Memory writes are proposed or written according to policy.
- The final handoff can seed a follow-up task.
Deferred decisions
These should be decided before coding starts, but they should not block the planning docs.
Hosted service posture
Relationship to existing Eidetic auth
Whether the first UI is built into Craik or kept separate
Decided project defaults
eidetic-labs/craikcraikcraikcraikcraik (if needed).~/.craikCRAIK_HOME~/.craik/secrets/ or env vars · redacted before persistence.httpx.pytest · ruff · mypy.Name availability snapshot.
Live registry checks on 2026-05-15 returned 404 for both
https://pypi.org/pypi/craik/json and
https://registry.npmjs.org/craik — the names appeared available at
that time. Publish early once package metadata is ready. If the plain
distribution name is lost before publication, fall back to
craik-runtime while preserving craik for the module and CLI command.
Contribution and trademark follow-up
MIT governs code reuse but not project governance, contribution terms, or trademark rights. Initial lightweight governance lives in root-level policy files.
CONTRIBUTING.mdSECURITY.md.TRADEMARKS.mdMAINTAINERS.mdBefore broad external contribution, revisit: final security
contact · DCO enforcement automation · release automation · package
publishing ownership · whether a dedicated governance document is
needed after 0.1.0.