Skip to main content
Version: MVP

Instruction sources

4 min readReferenceUpdated 2026-05-21

What you'll find here

The registry, hash state, provenance, categories, stale invalidation, conflict handling, and promotion rules that turn declared instruction files into runtime constraints.

Not every Markdown file is authority.

Craik does not treat raw instruction files as runtime authority. Sources declare candidate evidence; promotion is a separate gate.

Supported sources

KindPath
agents_mdAGENTS.md
claude_mdCLAUDE.md
gemini_mdGEMINI.md
hermes_mdHERMES.md
skills_mdSKILLS.md
cursor_rules.cursorrules
github_copilot_instructions.github/copilot-instructions.md
codex_instructions.codex/instructions.md
policy_docExplicitly declared project policy doc path

Standard source kinds must use their canonical path. policy_doc sources declare their own path and must be listed in the registry's declared_policy_doc_paths.

Declared paths stay inside the project.

Instruction ingestion resolves every source path under the registered project root and rejects paths that escape it. Standard Markdown-shaped sources emit one candidate per bullet outside fenced code blocks. Declared policy documents are captured as one free-form statement block so policy prose keeps its review context.

Detection order

Craik never scans arbitrary Markdown as authority. A source becomes eligible only after registration, and downstream ingestion follows the registered source list in deterministic order:

  1. Canonical root instruction files: AGENTS.md, CLAUDE.md, GEMINI.md, HERMES.md, SKILLS.md.
  2. Tool-specific instruction files: .cursorrules, .github/copilot-instructions.md, .codex/instructions.md.
  3. Explicitly declared policy_doc paths in registry order.

Within a project, the runtime registry stores active sources in stable kind/path order so repeated registration and ingestion passes produce predictable downstream records.

Registry boundaries

craik.instruction_source_registry is project-scoped. It records declared sources, active source IDs, and policy doc paths. Active source IDs must refer to registered active sources.

Discovery, not approval.

The registry is a discovery boundary, not an approval boundary. Later distillation and promotion steps must still preserve provenance, stale-source state, contradiction reports, and human approval before extracted instructions become active runtime constraints.

Registration is a receipted discovery action. The runtime register_source API writes four records:

craik.instruction_source

The current declared source entry used by ingestion.

craik.instruction_source_registration

The immutable registration event with owner, actor, path, trust boundary, and optional content hash.

craik.instruction_registry_receipt

The audit record proving the registration action completed.

craik.instruction_source_registry

The project-level registry containing active source IDs and policy document paths.

Hash state and provenance

craik.instruction_source_snapshot records observed source identity with a sha256 content hash when the source is present. Snapshot refresh reads each active registered source relative to the project repo root, applies the same path confinement as ingestion, normalizes line endings to \n, and hashes the normalized bytes. Each refresh also records byte count and line count for present files.

Hash status
Meaning
Notes
unchanged
stable
Observed hash matches the previous known state.
changed
drift
Source exists and differs from the previous known state.
missing
gone
Declared source was not found · must not include a content hash.
new
first time
Source exists but has no previous known state.
oversize
skipped
Source exceeds the per-file snapshot budget and is excluded from proposal ingestion.

refresh_project_snapshots(store, project_id) persists the current snapshot set before returning it. That returned set is the input to stale invalidation; invalidation compares it with the prior stored snapshot for each source so a persisted refresh can still defer distillations whose source changed, disappeared, or was newly observed.

craik.instruction_provenance links each parsed statement candidate back to its source snapshot. The extractor emits one provenance record per statement with deterministic ID, source ID, snapshot ID, path, line range, column range, summary, and excerpt_hash. The summary is the first non-empty statement line capped at 200 characters. The excerpt hash is the SHA-256 digest of the extracted statement text, so future quote readers can verify the source excerpt before display.

Provenance uses a precise line range when available or a source-level fallback when the extractor cannot identify stable lines. Partial line ranges are invalid because they make review ambiguous.

Distilled instruction categories

Categorization is deterministic and heuristic; no model call is made when a source is ingested. Each categorization result records the matched rule name and confidence. Unmatched statements are not written as proposals; they are returned in the ingestion summary as unclassified warnings for operator review.

CategoryMeaning
instructionGeneral runtime guidance for agents.
policyGovernance, approval, or authority requirement.
preferenceStable user, team, or project preference.
commandCommand or validation instruction.
boundaryScope, ownership, or authority boundary.
handoff_ruleRequirement for durable handoff content or timing.
memory_ruleRule for memory reads, writes, proposals, or promotion.
security_ruleSecurity, secret, or safety-sensitive requirement.
stale_riskWarning that prior context may become stale or unsafe.

Policy and security-rule proposals require evidence.

In addition to provenance. Approved proposals must include a promoted constraint ID plus reviewer and decision time. Rejected and deferred proposals also preserve reviewer and decision time.

ingest_project_instructions(store, project_id) runs the declared source pipeline end to end: refresh snapshots, parse sources, persist provenance, categorize statements, persist distillation proposals, and return source, snapshot, provenance, proposal, and unclassified counts.

Stale invalidation

Snapshots are compared by source ID and content hash. Distillations are deferred when their source changes, goes missing, is newly discovered, or is omitted from the current scan. Deferral preserves the proposal, provenance, evidence, and previous review decision for audit, but the proposal is excluded from automatic promotion until it is reviewed again.

Case files and onboarding reports surface stale instruction warnings so agents do not treat outdated distilled instructions as active constraints.

Instruction conflicts

Open a contradiction report

Incompatible instruction · policy · command · boundary · security_rule proposals. Reports link conflicting proposal IDs, source IDs, and provenance IDs.

Keep as reviewable disagreement

preference and stale_risk disagreements stay as reviewable proposals — they may represent tolerable local variation, not mutually exclusive authority.

Conflicting proposals are deferred and excluded from automatic promotion until the contradiction is reviewed.

Contradiction detection compares proposals across different sources only. It normalizes each governing candidate into a category, subject, and value before comparison, so tool allowlist policies, command allow/deny rules, and boundary negations can disagree without relying on raw string equality. Deferred stale proposals and rejected proposals are skipped; same-source disagreements remain local to that source until a later operator review path handles them.

Promotion reviews

craik.instruction_promotion_review records approved, rejected, and deferred promotion decisions. Reviews link policy envelopes, receipts, memory proposals, and handoffs.

The runtime approval API requires an explicit operator identity. Fresh ingestion leaves proposals in proposed; approve_instruction moves the item to governing, writes the approval review, and creates an active promoted constraint. reject_instruction writes a denial review and leaves no active constraint. Re-approving an already governing item returns the existing review and constraint instead of duplicating receipts.

Stale or contradicted proposals cannot become governing unless the operator sets an override and records override rationale. The approval review records whether stale or contradiction guards were bypassed. list_governing returns only active constraints backed by governing, non-contradicted proposals; downstream case files and prompt compilation must consume that list rather than raw proposal rows.

Approved reviews create craik.promoted_instruction_constraint records. Active constraints retain proposal ID · source ID · source snapshot ID · provenance IDs · evidence IDs · review links.

Only approved, non-contradicted constraints are consumed.

Proposed, rejected, deferred, stale, or contradicted distillations remain visible for review but inactive.

Runtime consumption

Case files

Include active constraints in context_budget.active_instruction_constraints and first-class governing distillation entries in distillations.

Prompts

Compilation renders governing distillations in a separate authoritative section with provenance annotations.

Onboarding

Reports include active instruction summaries in the project model.

Handoffs

Carry active constraint IDs forward as context debt.

Case-file distillation entries are grouped in deterministic category order and include proposal ID, constraint ID, source ID, source snapshot ID, category, statement, provenance ranges, and a snapshot of the approval review. Rejected, superseded, deferred stale, or contradicted proposals are excluded from newly built case files; previously persisted case-file revisions remain immutable records of the context available at build time.

What's next