Learning loops
What you'll do
Walk through Craik's learning-loop discipline — telemetry, proposals, replay, receipts, promotion gates, rollbacks, and trajectory exports. Loops turn observed skill behavior into reviewable improvement records. They never let an agent silently rewrite reusable guidance.
No silent self-modification.
A skill proposal becomes promoted guidance only after explicit approval by a non-agent reviewer. Missing promotion gates produce a denied decision — and denied decisions are valuable review artifacts.
Supported flow
- Record skill telemetry for an invocation.
- Draft a skill proposal from telemetry, evidence, and receipts.
- Run skill replay against redacted fixtures.
- Record review, replay, promotion, rollback, and export decisions with learning receipts.
- Apply skill promotion gates before promoted guidance changes.
- Use skill rollbacks when a promoted version regresses.
- Use training trajectory exports and compressed summaries for replay and review.
Learning loops can also use memory review nudges and preference facts when repeated behavior suggests a reviewable memory update or preference clarification.
Operator commands
Craik exposes the learning-loop posture through guarded skill commands:
craik skills telemetry
craik skills proposals
craik skills eval
craik skills promote skill_proposal_docs --dry-run
craik skills rollback skill_docs --dry-run
craik skills history
These commands require an active operator session. Promotion and rollback commands default to dry-run posture and do not silently change reusable guidance. They report the missing gates until approval, replay evidence, and receipts exist.
Evidence boundary
Every learning-loop step preserves ids — not raw payloads.
Task ids
Policy envelope ids
Evidence ids
Receipt ids
Telemetry ids
Replay fixture ids
Replay result ids
Proposal ids
Promoted version ids
Rollback version ids
Unresolved risk ids
Redact before persistence.
Telemetry, receipts, proposals, exports, and summaries redact secrets, private prompts, private payloads, raw outputs, traces, trajectories, credentials, and local-only filesystem paths.
Promotion requirements
A proposal can become promoted guidance only after every requirement below is satisfied.
Approved proposal
Structured improvement plan
Non-agent approver
Policy envelope context
Evidence ids
Eval or replay result ids
Receipt ids
Approval receipt id
Missing promotion gates produce a denied promotion decision. Denied decisions are review artifacts and must keep explicit denial reasons.
Rollback requirements
Rollbacks target a prior promoted version. A rollback decision preserves:
Promoted version id
Rollback version id
Rollback reason & rationale
Policy envelope context
Evidence ids
Receipt ids
Replay result ids
Rollback decision receipt
Rollbacks don't invent replacement guidance.
A rollback moves back to a known prior version and leaves an audit trail. It is not a hook for silently substituting new guidance.
Trajectory review
Training trajectory exports are redacted replay and review artifacts.
Full exports
Keep decision-level detail. Use these when reviewers need diagnostics, artifacts, observed output, or per-step timestamps.
Compressed summaries
Keep only the links needed for review: receipt ids · evidence ids · policy envelope ids · replay fixture ids · replay result ids · unresolved risk ids. Omit decision detail by design.
Safe diagnostics
uv run --extra dev ruff check .
uv run --extra dev mypy
uv run --extra dev pytest tests/test_skill_telemetry.py tests/test_skill_proposals.py tests/test_skill_replay.py
uv run --extra dev pytest tests/test_learning_receipts.py tests/test_skill_promotions.py tests/test_skill_rollbacks.py
uv run --extra dev pytest tests/test_trajectory_exports.py tests/test_docs.py
Expected: All checks passed! · Success: no issues found · passed.
If a command fails, preserve the command, failing test name, and sanitized error summary in a receipt or review note. Do not copy raw prompts, credentials, private payloads, or local-only paths into public docs.