Differentiators
What you'll learn
- The features that keep Craik's roadmap from collapsing into basic CLI, storage, and adapter work.
- Why every durable assertion must be traceable to evidence.
- How Craik separates raw agent output from organizational truth.
- The runtime primitives — assumption ledger, scope locks, scratchpad expiry, context debt, runtime critic — that make agent work auditable.
Don't become a generic agent launcher.
Craik's differentiators center on durable, governed, evidence-backed agent work. This doc captures the features that should keep the roadmap honest as it grows.
Evidence-first execution
Every durable conclusion should be traceable to evidence.
Runtime rule: No durable assertion without evidence. Craik may allow low-confidence assumptions, but they must not be promoted to durable facts without evidence.
Evidence sources: file reads · command output · GitHub issues, PRs, comments, checks · Stigmem facts · user instructions · web sources · prior handoffs · generated artifacts · runner outputs.
Assumption ledger
Agents make assumptions constantly. Craik should separate assumptions from facts. Each assumption captures statement · source · confidence · task context · verification requirement · expiration · whether action is allowed before verification. Assumptions are visible in case files, handoffs, and memory diffs.
Belief promotion workflow
Craik should distinguish raw agent output from organizational truth.
observed → proposed → accepted → relied_upon → stale → invalidated
The lifecycle applies to memory proposals and eventually to selected Stigmem facts through metadata or companion facts.
Context budgeting as policy
Context assembly should be explainable. Case files capture why each item was included · what was summarized · what was excluded · what was omitted due to budget · what must be fetched on demand · whether omissions create risk.
Agent run reproducibility
Run records link the full provenance chain so reviewers can replay operationally — not deterministically — what an agent knew and what it was allowed to do.
Trust boundaries between agents
Codex, Claude, Gemini, and future runners are not equally trusted by default. Policy controls whether a runner may propose facts · write facts · edit files · run shell · open issues or PRs · approve another agent's work · resolve contradictions · use fail-open profiles.
Cross-agent review protocol
Explicit review roles instead of single orchestrator/specialist decomposition.
Implementer
Does the primary work.
Verifier
Runs validation, confirms claims.
Adversarial reviewer
Finds gaps and unsupported claims.
Policy reviewer
Checks governance compliance.
Documentation reviewer
Aligns docs with implementation.
Memory curator
Hygiene over time.
Release reviewer
Gate before publication.
Adjudicator
Resolves disagreements.
Review outputs are typed, evidence-linked, and graph-connected.
Staleness as a first-class signal
Old truths are a major failure mode. Craik surfaces staleness for facts · docs · handoffs · assumptions · GitHub issue state · branch state · runner outputs · generated artifacts · project policies. Every case file says what's fresh, stale, or unknown.
Decision record suggestions
Craik notices when runtime knowledge is becoming durable project policy.
Signals: repeated reliance on the same fact · resolved contradictions that affect future behavior · recurring policy overrides · repeated docs updates from the same root cause · cross-agent agreement on an architectural constraint.
Craik suggests that maintainers create or update ADRs — it does not write them automatically.
Agent-native onboarding
craik onboard --project <project-id> outputs the canonical bundle a
new runner needs.
Output: current project model · active policies · relevant ADRs · docs boundaries · recent handoffs · unresolved contradictions · validation commands · Stigmem connection status · known traps · allowed next actions.
Provenance-aware documentation
For generated or updated docs, Craik records source facts · source files · source issues/PRs · relevant policies · validation commands · authoring agent · review agent · update timestamp. Documentation stays tied to the evidence that justified it.
Policy tests
Craik policies are testable. Policy tests run in CI and fixture-based local tests.
Immutable paths
ADRs cannot be edited under strict mode.
Memory proposal default
Memory writes become proposals unless granted.
Trusted-local receipts
Fail-open still seals receipts.
Automation fail-closed
Automation mode stops instead of widening.
Grant boundaries
Runner adapters cannot bypass grants.
Redaction regressions
Secrets are scrubbed from receipts and handoffs.
Human delegation points
Human involvement is a runtime primitive, not an interruption.
Delegation kinds: approval request · clarification request · policy override request · contradiction adjudication request · memory promotion request · release signoff request.
Delegation points become graph nodes, appear in handoffs, and produce receipts when resolved.
Budget and quota controls
Budgets bound agent work with operational limits visible in case files and receipts: context tokens · model spend · wall-clock time · shell command count · GitHub write count · memory write count · parallel worker count · retry count · human approval count.
Learning without self-trust
Agents may propose facts · skills · policy refinements · validation commands · docs updates · decision record suggestions · plugin ideas. Promotion always requires evidence, policy, review, or explicit approval.
The self-trust rule
Craik may learn continuously, but it should not self-certify truth.
This principle guides every self-improving feature.
Runtime instruction distillation
Craik turns declared agent-runtime instruction files into structured runtime memory.
Recognized sources: AGENTS.md · CLAUDE.md · GEMINI.md ·
HERMES.md · SKILLS.md · .cursorrules ·
.github/copilot-instructions.md · .codex/instructions.md ·
project policy docs explicitly listed in the project profile.
Source Markdown remains canonical. Distilled output is a provenance-linked runtime projection.
Distilled categories: instruction · policy · preference · command · boundary · handoff rule · memory rule · security rule · stale-risk.
Distillations track source path, source hash, line/range, scope, timestamp, and extraction confidence. Extracted items become proposals by default and are invalidated when the source hash changes.
Task intent lock
Craik freezes the accepted task intent before execution. The lock captures original request · accepted interpretation · excluded work · allowed autonomy · stop conditions · scope-change rules — giving agents a stable north star and making scope drift reviewable.
Scratchpad with expiry
Working memory that is not durable truth. Scratchpad space holds temporary notes · candidate hypotheses · partial findings · links to inspect · unresolved fragments — and expires at task end unless promoted to assumptions, facts, handoffs, or artifacts.
Negative knowledge
Useful dead ends are preserved with freshness rules.
Approaches rejected
What's already been tried and didn't work.
Failed commands
Commands that errored and why.
Non-existent APIs
Endpoints checked and not found.
Irrelevant files
Files inspected and found unrelated.
Disproven assumptions
Claims refuted by evidence.
Unavailable names
Package or registry names checked and not free.
Absence can change — freshness rules apply to negative knowledge too.
Capability dry run
Before granting side-effecting capabilities, an agent previews intended actions: files expected to change · shell commands expected to run · GitHub writes expected · facts expected to be proposed or written · policy triggers · approvals likely needed. The runtime then grants narrower authority.
Evidence coverage score
A real coverage signal, not a fake certainty score.
Structured agent debate
When agents disagree, Craik structures the disagreement. Debate records capture claim · evidence · counterclaim · counter-evidence · missing verification · adjudicator decision · resulting memory updates.
Self-audit before handoff
Before finishing, agents run a standard self-audit.
- Answered the locked intent.
- Stayed in scope.
- Cited evidence.
- Recorded assumptions.
- Recorded validation.
- Created needed facts or proposals.
- Avoided forbidden paths.
- Left next steps.
- Produced a useful handoff.
Context debt tracking
When context is omitted, summarized, or deferred because of budget, Craik tracks omitted item · reason · risk · required follow-up · whether the current task may proceed. Context debt is durable; the next run inherits it as carryover.
Tool result attestation
Different result sources have different trust profiles.
Important claims like "tests passed" should require runtime-observed receipts whenever possible.
Runtime memory hygiene
Curator workflows for memory quality. Curator tasks find stale assumptions · duplicate facts · unpromoted useful proposals · weak-evidence facts · contradictions · expired handoffs · obsolete negative knowledge. Cleanup is proposed, never automatically destructive by default.
Recovery mode
Interrupted runs are resumable. Recovery uses task request · intent lock · case file · policy envelope · partial receipts · scratchpad · changed files · unfinished handoff · unresolved delegations · memory proposals. Incomplete runs still leave useful handoffs.
Runner capability matrix
Craik knows what each runner can do — and routes accordingly.
Capabilities tracked: shell access · file patching · browser/web access · MCP support · image input · structured output · long context · background tasks · approval flow · tool-call reliability.
The matrix influences runner selection, prompt compilation, and policy grants.
Scope change protocol
When an agent finds work outside the locked intent, it files a scope-change proposal capturing requested scope change · rationale · evidence · risk · whether current work is blocked · recommended action.
Knowledge freshness probe
Before relying on stale or high-impact facts, Craik can refresh relevant state.
Probe targets: repo state · GitHub state · package registries · Stigmem facts · local command output · web sources (when allowed).
Public / internal boundary classifier
Craik classifies where content belongs and helps prevent internal-only labels or implementation tracking details from leaking into public docs.
Targets: public docs · internal docs · issue or PR comments · memory facts · handoffs · release notes · audit artifacts.
Runtime context explanations
Every case-file item is explainable. Agents should be able to ask, "Why am I seeing this?" and get a real answer.
Policy required
Included because policy mandates it.
Recent handoff
Included because a recent handoff referenced it.
Contradiction
Included because it contradicts a current assumption.
Stale + high-risk
Included because it is stale but high-risk.
Task-type
Included because the task type requires it.
Structured context requests
Agents request more context through a structured protocol. Fields: need · reason · urgency · allowed source scope · blocking status · expected output shape. Craik fulfills requests through safe channels and records the result.
First-class unknowns
Agents say "unknown" without being treated as incomplete. Unknowns identify whether resolution requires web access · user input · repo inspection · privileged tool use · Stigmem query · waiting for external state.
Runtime critic
A structured critic pass before accepting major outputs.
Unsupported claims
Claims without evidence references.
Policy violations
Actions that crossed the envelope.
Scope drift
Work outside the intent lock.
Missing validation
Claims unverified by command or test.
Stale evidence
Citations that may have moved.
Missing handoff
Run that didn't close cleanly.
Unredacted content
Sensitive data that slipped through.
Risky memory writes
Promotions without sufficient evidence.
Agent workload memory
Routing memory, not social reputation. Craik remembers which agents and runners perform well on which work.
Signal examples: strong at docs reconciliation · weak at shell-heavy debugging · strong at policy review · tends to miss stale GitHub state · needs stricter context · produces high-quality handoffs.
Known traps
Projects maintain known traps — negative knowledge appearing in onboarding and case files.
Don't edit ADRs
Public docs can't reference internal labels
Tests must run outside the sandbox
Generated docs live elsewhere
Local node advertises a non-standard port
Package version is intentionally pre-release
Evidence expiration rules
Different evidence kinds have different shelf lives.
Handoff quality score
Handoffs are checked for completeness.
Signals: completed work · changed files · validation · assumptions · unresolved questions · next steps · facts proposed or written · receipts · context debt · delegation status.
Policy-aware prompt compiler
Craik compiles runner-specific prompts from the same underlying runtime contracts.
Inputs: locked task intent · policy envelope · context contract · runner capabilities · evidence · assumptions · allowed tools · output schema.
Codex, Claude, and Gemini may need different prompt shapes, but the underlying truth is shared.
Real-runner contract tests
Mocks are not enough for runner adapters. Craik periodically tests Codex, Claude, and Gemini adapters against fixture tasks and verifies that outputs conform to Craik contracts.
Memory impact preview
Before writing facts to Stigmem, Craik shows a memory-diff preview: facts to add · facts to invalidate · contradictions likely to open · affected case files / handoffs / docs · scope and visibility · confidence · evidence.
Agent exit discipline
Agents that cannot complete a task still leave useful state.
Incomplete exits include: why blocked · what was checked · what is safe to continue · what is unsafe · missing context · unresolved delegations · next best action.
Red team mode
High-risk tasks support a stricter reviewer mode. Checks include leaked secrets · public/internal boundary violations · unsupported claims · unsafe command grants · bad memory writes · policy bypasses · misleading docs updates.
Work product classification
Every artifact has a type and lifecycle, and the class drives policy.
Scratch
Expires at task end.
Proposal
Awaits review.
Implementation
The primary deliverable.
Review
Cross-agent review output.
Decision
ADR or equivalent.
Release
Versioned, signed.
Public docs
External-facing.
Internal docs
Operator-only.
Memory update
Fact-store delta.
Audit artifact
Receipt, handoff, graph export.
What changed since last time
Before an agent starts, Craik shows relevant deltas since the last related run — continuity without forcing rediscovery.
Tracked deltas: files changed · facts changed · issues changed · PRs changed · policies changed · handoffs added · contradictions opened or resolved · package versions changed.