Skip to main content
Version: MVP

Feature Specification

12 min readFor implementersUpdated 2026-05-19

What you'll find here

The implementable feature surface. Every numbered feature names its purpose, the MVP behavior the runtime ships, and the acceptance criteria that must hold for the feature to count as done.

This doc turns Craik's product ideas into implementable features.

Each feature is independently shippable but composes with the rest of the runtime through stable contracts. Acceptance criteria are the durable check — implementations are free to evolve as long as the criteria still hold.

Feature 1: Project registry

Purpose: Store known projects and their runtime configuration.

MVP behavior: craik project add <path> creates a project profile · profile records repo path, default branch, docs paths, immutable paths, memory backend · profile is printable as JSON.

  1. A project can be added from a local Git repo.
  2. Invalid paths fail with clear errors.
  3. Immutable doc paths can be configured.
  4. Project-profile schema validates.

Feature 2: Case-file assembler

Purpose: Build task-specific context before execution.

Inputs: task request · project profile · repository state · docs snippets · repository discovery defaults + project/user include/exclude overrides · ADR/policy snippets · GitHub state (when configured) · Stigmem facts (when configured) · recent handoffs.

Outputs: case file JSON · human-readable Markdown case file · stale-risk list · contradiction list · context inclusion/exclusion/ omission metadata · verification plan.

  1. Case-file generation is deterministic for test fixtures.
  2. Generated, dependency, build, cache, and archive-heavy paths are excluded by default.
  3. Project and user overrides can extend, replace, or explicitly include paths from the defaults.
  4. Excluded and omitted paths are visible in context metadata.
  5. Immutable ADR paths are clearly labeled.
  6. Memory facts include source and confidence.
  7. Stale-risk warnings are separated from verified facts.
  8. Output can be consumed by an agent prompt.

Feature 3: Policy envelope

Purpose: Define execution authority and obligations.

MVP behavior: read-only tasks default to repo-read, memory-read, and receipt-write · implementation tasks require explicit write grants · immutable paths cannot be written unless policy explicitly allows it · memory writes default to proposals unless configured for direct write · strict mode is the default profile · fail-open is profile-gated · fail-open appears in case files, receipts, and handoffs.

  1. Denied file writes are blocked.
  2. Denied memory writes become proposals.
  3. Policy envelope is included in the case file.
  4. Policy failures create receipts.
  5. Trusted-local fail-open requires explicit opt-in.
  6. Automation mode fails closed instead of widening permissions.

Feature 4: Capability receipts

Purpose: Record important actions in a concise, queryable form.

Receipt-producing actions: file writes · shell commands · GitHub writes · memory writes · approvals · policy denials · contradiction opens/resolutions · handoff creation.

  1. Receipts are persisted locally.
  2. Receipts can be listed by task.
  3. Receipt IDs appear in handoffs.
  4. Receipts include actor, capability, target, reason, result, and timestamp.
  5. Receipts include policy profile and fail-open status.
  6. Receipt payloads are redacted before persistence.

Feature 5: Handoff writer

Purpose: Make agent work reusable by future agents.

MVP behavior: generate structured JSON handoff · generate Markdown handoff · link receipts · include memory proposals · include unresolved questions and next steps.

  1. Every completed task has a handoff.
  2. Handoff validates against schema.
  3. Handoff includes verification status.
  4. Handoff can be loaded into a future case file.

Feature 6: Memory store interface

Purpose: Separate the runtime from the memory backend.

EphemeralMemoryStore

For tests and demos. Resets between calls.

LocalMemoryStore

SQLite-backed local proposals and approved facts. Persists between CLI calls.

StigmemMemoryStore

The reference team-scale substrate. Reads facts, captures provenance, detects optional capabilities.

Required methods: search_facts(query, scope) · list_facts(entity, relation) · propose_fact(proposal) · write_fact(fact) · invalidate_fact(fact_id, reason) · diff(run_id).

  1. All backends implement the same interface.
  2. Tests run against the ephemeral backend.
  3. The local backend persists between CLI calls.
  4. The Stigmem backend reads and writes facts with provenance.
  5. The Stigmem backend detects optional recall and conflict capabilities.
  6. Direct Stigmem writes require grants.
  7. Unavailable optional Stigmem capabilities fall back to local Craik state.

Feature 7: GitHub adapter

Purpose: Connect Craik tasks to live collaboration state.

MVP reads

  • Repository metadata
  • Open issues
  • Open PRs
  • Branch status
  • Changed files
  • Comments
  • CI / check status

MVP writes

  • Create issue
  • Create PR
  • Create comment
  1. GitHub writes require capability grants.
  2. Reads fail gracefully when unauthenticated.
  3. PR / issue references are included in case files.
  4. Created links appear in handoffs.

Feature 8: Work graph

Purpose: Model agent work as connected state.

MVP nodes: task · handoff · fact · file · issue · PR · receipt · verification.

MVP edges: created_by · depends_on · verified_by · updates · blocks · contradicts.

  1. Every task creates a task node.
  2. Every handoff links to the task and its receipts.
  3. Memory proposals link to evidence.
  4. The graph can be exported as JSON.

Feature 9: Contradiction inbox

Purpose: Make disagreement operational.

MVP behavior: detect contradictions reported by agents · store contradiction reports · list open contradictions · resolve with rationale · create memory proposals from resolution.

  1. Contradictions are not silently overwritten.
  2. Resolution records evidence.
  3. Affected artifacts can be listed.
  4. Resolved contradictions appear in the memory diff.

Feature 10: Memory diff

Purpose: Explain how memory changed during a run.

MVP behavior: list proposed facts · list written facts · list invalidated facts · list contradictions opened/resolved · list handoff facts created.

  1. Memory diff can be printed for any task.
  2. Diff is linked from the handoff.
  3. Diff can be stored as a receipt artifact.

Feature 10a: Single-agent execution loop

Purpose: Drive one agent through a governed task loop while preserving state, policy checks, receipts, and stop conditions.

MVP behavior: persistent run id · plan → act → observe → evaluate → continue → stop phases · runner invoked through a step contract · bounded runner context from defaults + overrides · max-iteration, timeout, and budget enforcement · policy check before side effects · receipts for important steps · captured outputs · stop on intent-lock conditions · handoffs on completion / block / failure / interruption · resumable runs.

  1. Craik owns the loop boundary — never an untracked chat transcript.
  2. Loop state is inspectable.
  3. Runner context avoids generated and dependency-path pollution by default.
  4. Context defaults are overridable and their effects are inspectable.
  5. Side effects cannot bypass grants.
  6. Blocked approvals halt the loop.
  7. Memory updates are proposed unless direct writes are granted.
  8. Run recovery does not require replaying raw conversation history.

Feature 11: Orchestrator and specialists

Purpose: Support multi-agent workflows after the single-agent loop works.

MVP roles: orchestrator · researcher · docs reviewer · implementer · verifier · adjudicator.

Behavior: orchestrator decomposes the task into child tasks · specialists receive case-file excerpts · specialists return typed worker results · orchestrator merges results into the handoff · contradictions are escalated instead of flattened.

  1. Independent read-only tasks can run in parallel.
  2. Worker outputs validate against schema.
  3. Child handoffs link to the parent task.
  4. The orchestrator cannot discard unresolved contradictions.

Feature 11a: First-class runner adapters

Purpose: Let Craik work directly with real agent runners instead of requiring a separate agent framework as the execution layer.

Initial adapters: Codex · Claude · Gemini.

Adapter responsibilities: receive task request, case file, policy envelope, and grants · start or guide a runner session · preserve runner identity and version metadata · capture typed worker results · capture receipts or receipt inputs · capture handoff output · return proposed memory updates · report blocks, failures, or missing capabilities.

  1. Each adapter implements the same runner interface.
  2. Adapter outputs validate against Craik contracts.
  3. Runner-specific details do not leak into core contracts.
  4. Unsupported capabilities fail clearly.
  5. A task can be replayed or inspected from Craik artifacts without raw chat history.

Adjacent-runtime integration is tracked as a later bridge, not a dependency for this feature.

Feature 12: Skills and plugins

Purpose: Make repeated workflows reusable while keeping authority governed.

MVP behavior: skills are instruction packages scoped to project or runtime · plugins expose typed capabilities · plugin capabilities require grants · plugin actions produce receipts · probationary plugins have restricted permissions · plugins cannot bypass runner or task policy envelopes.

  1. Project-scoped skills override global skills.
  2. Skills can declare required context contracts.
  3. Plugin descriptors validate.
  4. Probationary plugin use is visible in receipts.

Feature 13: Context contracts

Purpose: Define what context a task type must receive.

Docs review

docs paths · implementation references · ADR policy · recent facts · stale-risk list.

Implementation

branch state · test commands · capability grants · relevant issues · coding conventions.

Release work

version policy · changelog policy · package-registry state · CI requirements.

  1. Missing required context blocks execution or creates a warning.
  2. The case file marks satisfied and missing context.
  3. Roles can declare required context contracts.

Feature 14: Agent reputation

Purpose: Measure reliability without turning it into popularity.

Signals: facts later contradicted · tests passed/failed after edits · policy violations · handoff completeness · review findings accepted · tasks completed without rework.

  1. Reputation is scoped by role/domain.
  2. Metrics are explainable.
  3. Reputation affects routing only when policy enables it.

Scope today

This feature is not in the MVP implementation. Contracts leave room for it; the routing surface remains policy-driven.

Feature 15: Evidence and assumption management

Purpose: Distinguish evidence-backed facts from unverified assumptions.

MVP behavior: case files include evidence references · agent conclusions can be marked as assumptions · assumptions include confidence and verification requirements · memory proposals require evidence references before promotion · handoffs list unresolved assumptions.

  1. Unsupported assertions do not become direct memory writes.
  2. Assumptions are visible in case files and handoffs.
  3. Evidence references can point to files, commands, GitHub objects, Stigmem facts, user instructions, or prior handoffs.
  4. Memory promotion fails when required evidence is missing.

Feature 16: Agent-native onboarding

Purpose: Give a new agent a safe, current project model before it starts work.

Target command: craik onboard --project <project-id>

MVP output: current project model · active policy profile · relevant ADRs and immutable paths · docs boundaries · recent handoffs · unresolved contradictions · stale-risk warnings · validation commands · Stigmem backend status · allowed next actions.

  1. Onboarding output is generated from the same case-file primitives as tasks.
  2. Stale or missing context is clearly marked.
  3. Policies and write boundaries are visible.
  4. Output is usable by Codex, Claude, and Gemini runner adapters.

Feature 17: Policy tests

Purpose: Make runtime policy behavior testable and regressions visible.

Required policy tests: ADR paths cannot be edited in strict mode · memory writes become proposals by default · trusted-local fail-open still records receipts · automation mode fails closed · runner adapters cannot bypass grants · secrets are redacted from receipts, logs, handoffs, and case files.

  1. Policy tests run in CI once implementation begins.
  2. Failures identify the violated policy.
  3. Every new policy profile must include fixture tests.

Feature 18: Human delegation points

Purpose: Make human approval and clarification part of the work graph.

Delegation point types: approval request · clarification request · policy override request · contradiction adjudication request · memory promotion request · release signoff request.

  1. Delegation points are graph nodes.
  2. Resolution creates receipts.
  3. Unresolved delegation points appear in handoffs.
  4. Agents cannot silently continue past required approvals.

Feature 19: Budget and quota controls

Purpose: Keep agent work operationally bounded.

Budget types: context tokens · model spend · wall-clock time · shell command count · GitHub write count · memory write count · parallel worker count · retry count · human approval count.

  1. Budgets can be set by policy profile.
  2. Budget state appears in case files and receipts.
  3. Budget exhaustion blocks or escalates according to policy.
  4. Fail-open profiles do not bypass budget receipts.

Feature 20: Runtime instruction distillation

Purpose: Convert declared agent-runtime instruction files into structured, scoped, provenance-linked runtime memory.

Sources may include: AGENTS.md · CLAUDE.md · GEMINI.md · HERMES.md · SKILLS.md · .cursorrules · .github/copilot-instructions.md · .codex/instructions.md · declared project policy docs.

  1. Declared sources only.
  2. Source path, hash, timestamp, scope, and line/range provenance are tracked.
  3. Extracted items become proposals by default.
  4. Policy constraints can be promoted by approval.
  5. Contradictions between instruction sources are surfaced.
  6. Stale distillations are invalidated when source hashes change.
  7. Case files cite both the distilled item and its source file.

Feature 21: Intent, scratchpad, and scope control

Purpose: Keep agent work aligned while allowing temporary thinking.

Capabilities: task intent lock · scratchpad with expiry · scope-change proposal · first-class unknowns · structured context requests · context-debt tracking.

  1. Task execution references the accepted intent lock.
  2. Scratchpad entries expire unless promoted.
  3. Out-of-scope discoveries create scope-change proposals.
  4. Unknowns identify what is needed to resolve them.
  5. Context requests are recorded.
  6. Context debt appears in handoffs.

Feature 22: Runtime quality gates

Purpose: Improve output quality before durable handoff or memory writes.

Capabilities: self-audit before handoff · runtime critic · red team mode · handoff quality score · evidence coverage score · tool result attestation · agent exit discipline.

  1. Major handoffs include self-audit results.
  2. Critic findings are typed and actionable.
  3. High-risk tasks can require red-team review.
  4. Test / command claims distinguish runtime-observed from agent-reported.
  5. Incomplete runs still produce useful exit handoffs.
  6. Low-quality handoffs can block memory promotion.

Feature 23: Runtime intelligence and routing

Purpose: Make Craik smarter about runners, evidence, artifacts, and continuity.

Capabilities: runner capability matrix · agent workload memory · known traps · evidence expiration rules · knowledge freshness probe · policy-aware prompt compiler · real-runner contract tests · work product classification · "what changed since last time" deltas.

  1. Runner selection can account for capabilities and workload memory.
  2. Known traps appear in onboarding and case files.
  3. Stale evidence can trigger freshness probes.
  4. Prompts compile from shared contracts into runner-specific forms.
  5. Adapter contract tests run against fixture tasks.
  6. Artifacts carry class and lifecycle metadata.
  7. Task start can show relevant deltas since prior runs.

What's next