Instruction sources
What you'll find here
The registry, hash state, provenance, categories, stale invalidation, conflict handling, and promotion rules that turn declared instruction files into runtime constraints.
Not every Markdown file is authority.
Craik does not treat raw instruction files as runtime authority. Sources declare candidate evidence; promotion is a separate gate.
Supported sources
| Kind | Path |
|---|---|
agents_md | AGENTS.md |
claude_md | CLAUDE.md |
gemini_md | GEMINI.md |
hermes_md | HERMES.md |
skills_md | SKILLS.md |
cursor_rules | .cursorrules |
github_copilot_instructions | .github/copilot-instructions.md |
codex_instructions | .codex/instructions.md |
policy_doc | Explicitly declared project policy doc path |
Standard source kinds must use their canonical path. policy_doc
sources declare their own path and must be listed in the registry's
declared_policy_doc_paths.
Declared paths stay inside the project.
Instruction ingestion resolves every source path under the registered project root and rejects paths that escape it. Standard Markdown-shaped sources emit one candidate per bullet outside fenced code blocks. Declared policy documents are captured as one free-form statement block so policy prose keeps its review context.
Detection order
Craik never scans arbitrary Markdown as authority. A source becomes eligible only after registration, and downstream ingestion follows the registered source list in deterministic order:
- Canonical root instruction files:
AGENTS.md,CLAUDE.md,GEMINI.md,HERMES.md,SKILLS.md. - Tool-specific instruction files:
.cursorrules,.github/copilot-instructions.md,.codex/instructions.md. - Explicitly declared
policy_docpaths in registry order.
Within a project, the runtime registry stores active sources in stable kind/path order so repeated registration and ingestion passes produce predictable downstream records.
Registry boundaries
craik.instruction_source_registry is project-scoped. It records
declared sources, active source IDs, and policy doc paths. Active
source IDs must refer to registered active sources.
Discovery, not approval.
The registry is a discovery boundary, not an approval boundary. Later distillation and promotion steps must still preserve provenance, stale-source state, contradiction reports, and human approval before extracted instructions become active runtime constraints.
Registration is a receipted discovery action. The runtime
register_source API writes four records:
craik.instruction_source
The current declared source entry used by ingestion.
craik.instruction_source_registration
The immutable registration event with owner, actor, path, trust boundary, and optional content hash.
craik.instruction_registry_receipt
The audit record proving the registration action completed.
craik.instruction_source_registry
The project-level registry containing active source IDs and policy document paths.
Hash state and provenance
craik.instruction_source_snapshot records observed source identity
with a sha256 content hash when the source is present. Snapshot
refresh reads each active registered source relative to the project
repo root, applies the same path confinement as ingestion, normalizes
line endings to \n, and hashes the normalized bytes. Each refresh
also records byte count and line count for present files.
unchangedchangedmissingnewoversizerefresh_project_snapshots(store, project_id) persists the current
snapshot set before returning it. That returned set is the input to
stale invalidation; invalidation compares it with the prior stored
snapshot for each source so a persisted refresh can still defer
distillations whose source changed, disappeared, or was newly
observed.
craik.instruction_provenance links each parsed statement candidate
back to its source snapshot. The extractor emits one provenance record
per statement with deterministic ID, source ID, snapshot ID, path,
line range, column range, summary, and excerpt_hash. The summary is
the first non-empty statement line capped at 200 characters. The
excerpt hash is the SHA-256 digest of the extracted statement text, so
future quote readers can verify the source excerpt before display.
Provenance uses a precise line range when available or a source-level fallback when the extractor cannot identify stable lines. Partial line ranges are invalid because they make review ambiguous.
Distilled instruction categories
Categorization is deterministic and heuristic; no model call is made when a source is ingested. Each categorization result records the matched rule name and confidence. Unmatched statements are not written as proposals; they are returned in the ingestion summary as unclassified warnings for operator review.
| Category | Meaning |
|---|---|
instruction | General runtime guidance for agents. |
policy | Governance, approval, or authority requirement. |
preference | Stable user, team, or project preference. |
command | Command or validation instruction. |
boundary | Scope, ownership, or authority boundary. |
handoff_rule | Requirement for durable handoff content or timing. |
memory_rule | Rule for memory reads, writes, proposals, or promotion. |
security_rule | Security, secret, or safety-sensitive requirement. |
stale_risk | Warning that prior context may become stale or unsafe. |
Policy and security-rule proposals require evidence.
In addition to provenance. Approved proposals must include a promoted constraint ID plus reviewer and decision time. Rejected and deferred proposals also preserve reviewer and decision time.
ingest_project_instructions(store, project_id) runs the declared
source pipeline end to end: refresh snapshots, parse sources, persist
provenance, categorize statements, persist distillation proposals, and
return source, snapshot, provenance, proposal, and unclassified counts.
Stale invalidation
Snapshots are compared by source ID and content hash. Distillations are deferred when their source changes, goes missing, is newly discovered, or is omitted from the current scan. Deferral preserves the proposal, provenance, evidence, and previous review decision for audit, but the proposal is excluded from automatic promotion until it is reviewed again.
Case files and onboarding reports surface stale instruction warnings so agents do not treat outdated distilled instructions as active constraints.
Instruction conflicts
Open a contradiction report
Incompatible instruction · policy · command · boundary · security_rule proposals. Reports link conflicting proposal IDs, source IDs, and provenance IDs.
Keep as reviewable disagreement
preference and stale_risk disagreements stay as reviewable proposals — they may represent tolerable local variation, not mutually exclusive authority.
Conflicting proposals are deferred and excluded from automatic promotion until the contradiction is reviewed.
Contradiction detection compares proposals across different sources only. It normalizes each governing candidate into a category, subject, and value before comparison, so tool allowlist policies, command allow/deny rules, and boundary negations can disagree without relying on raw string equality. Deferred stale proposals and rejected proposals are skipped; same-source disagreements remain local to that source until a later operator review path handles them.
Promotion reviews
craik.instruction_promotion_review records approved, rejected, and
deferred promotion decisions. Reviews link policy envelopes, receipts,
memory proposals, and handoffs.
The runtime approval API requires an explicit operator identity. Fresh
ingestion leaves proposals in proposed; approve_instruction moves
the item to governing, writes the approval review, and creates an
active promoted constraint. reject_instruction writes a denial review
and leaves no active constraint. Re-approving an already governing item
returns the existing review and constraint instead of duplicating
receipts.
Stale or contradicted proposals cannot become governing unless the
operator sets an override and records override rationale. The approval
review records whether stale or contradiction guards were bypassed.
list_governing returns only active constraints backed by governing,
non-contradicted proposals; downstream case files and prompt
compilation must consume that list rather than raw proposal rows.
Approved reviews create craik.promoted_instruction_constraint
records. Active constraints retain proposal ID · source ID · source
snapshot ID · provenance IDs · evidence IDs · review links.
Only approved, non-contradicted constraints are consumed.
Proposed, rejected, deferred, stale, or contradicted distillations remain visible for review but inactive.
Runtime consumption
Case files
Include active constraints in context_budget.active_instruction_constraints and first-class governing distillation entries in distillations.
Prompts
Compilation renders governing distillations in a separate authoritative section with provenance annotations.
Onboarding
Reports include active instruction summaries in the project model.
Handoffs
Carry active constraint IDs forward as context debt.
Case-file distillation entries are grouped in deterministic category order and include proposal ID, constraint ID, source ID, source snapshot ID, category, statement, provenance ranges, and a snapshot of the approval review. Rejected, superseded, deferred stale, or contradicted proposals are excluded from newly built case files; previously persisted case-file revisions remain immutable records of the context available at build time.