Implementation Plan · Deep Agent Ops · Cockpit 2026-07-01 · #003

Your Claude Code cockpit — and the delegation layer around it

You've built agent factories for AINA for months while your own controls sat on defaults. This is the afternoon that fixes it — grounded in how you actually work, phased so it can't sprawl.

Target ~/.claude/ + Mac↔VDS↔Hermes↔Paperclip Read 5 min scan · full when needed Source workflow wwig3dht3 · 9 agents · 956k tok
The Single Idea

Stop re-explaining the same guardrails every session. Convert your most-repeated prose rules — which erode under compaction and long-session drift — into mechanical config that makes the right thing automatic and ambient, so an ADHD founder with no internal reset mechanism doesn't have to hold it in his head.

The research corrected a false assumption: your cockpit is not blank. Hooks are mature, 227 skills and 20 plugins are installed, and bypassPermissions is deliberately on. What's genuinely never-touched is narrower and sharper — and that's exactly where the leverage is.

What you assumed was missing
  • Hooks — actually mature (127 rescue tarballs prove they fire)
  • Skills — 227 installed
  • Permission friction — already off by design
What's genuinely never-configured
  • ~/.claude/agents/zero personal subagents
  • /goal, routines, output-styles, statusLine
  • 3 automations pre-written but unwired
  • Personal skills — all 227 are vendored
Caught live during the research
Your ~/Downloads/agentops for ali repo is sitting on master with 80 dirty files and zero signal (your reconcile hook only watches the aina-platform family). And your Mac disk is at 86% used / 2.0Gi free — a real near-term failure surface.
Grounded in AgentsView — your real corpus, Apr 2 → Jul 1
The numbers turn hunches into evidence, and add three levers the first draft under-weighted: prompt maturity = grade D (missing success-criteria + verification — AgentsView calls this the highest-leverage fix → U11); 8,305 retries / 1,139 failures, retries tied to incomplete outcomes (no retry stop-rule existed → U12); one Codex session at $1,813 / 2.4B tokens, top 5 = $4,000+ (no cost checkpoints → U13). And AgentsView itself becomes the standing feedback loop (U14) — pullable read-only, no auth.
A past you already built half of this
Your agentops for ali folder (1,097 files) has real skills to adapt, not author: codex-prompt-builder (→ the U11 scaffold), codex-goals (→ /goal), codex-structured-outputs-and-gates (→ verifier), a session-closeout hook design (→ U4), and a full cost-discipline spec (→ U13). Genuinely fresh work shrinks to the retry stop-rule, the statusline, and the founder-brief style.
The deep mine's shift — advisory → deterministic
A 6-agent mine + review of ECC, your live Codex setup, and the AINA AgentOps plugin reframes the spine: "done" is currently a prompt asking Claude to verify itself. The real fixes are structural — a PreCompact hook that fires no matter what (U15), a Decision Contract judged at spec-time not merge-time (U16), an independent fact-checker that re-runs the check (U6). Only a PreCompact hook firing is a guarantee; a nudge is probabilistic.
You were right — Codex is wired, Claude Code isn't
~/.codex/config.toml has goals, memories, hooks, multi_agent all on, ~20 live automations (including U2's exact targets + a disk-pressure report), and a founder-brief skill that already exists. So U2 and U3 are ports, not authors, and the cockpit's job is bringing Claude Code to the parity Codex already runs. The AINA AgentOps plugin even has the merge/reconcile machinery (U9) in tested Python.

Contents

  1. 01The five pains that drive everything
  2. 02The sense→act→confirm loop
  3. 03The phased map
  4. 04Implementation units
  5. 05Delegation recommendation
  6. 06ADHD guardrails & risks
  7. 07Decisions only you can make
Section 01

The five pains that drive everything

Every decision below traces to an evidenced, recurring failure — not generic best practice:

PainWhy it earns a fix
Repo/branch amnesiaHighest-frequency, highest-cost. 33 project dirs in 7 days; a 627M-token "which repo is this" session. Live-verified broken now.
"Done" ≠ "Landed"The #1 named failure (13-agent forensic). Partly covered by mature Stop hooks — one escape hatch remains (direct commits to main).
Scope-pivot without closure7 pivots in one session, 4 projects abandoned-not-parked. ADHD-linked. No hook catches this — Stop fires too late.
Prose over mechanismReconcile / verify / land-it live as CLAUDE.md prose an agent silently drops under pressure. Only subagent-contracts are mechanized.
Never-configured leverageNo personal subagents; /goal, routines, output-styles, statusLine all unused; 3 automations written but idle.
Section 02

The sense→act→confirm loop (with AgentsView)

AgentsView — the new VDS session-observability app you're wiring in the other chat — is the sensing half of this cockpit's acting half. Together they close a loop you've never had.

AgentsView SEES the pattern stalls · drift · token burn Cockpit CORRECTS it hook · routine · goal · subagent AgentsView CONFIRMS the fix next week's sessions measured, not vibes
Cockpit changes get judged against what AgentsView shows — the honest feedback the ADHD guardrails need.
Section 03

The phased map

Phase 1 is three cheap, high-leverage wins landable this week — each its own commit. Additive-complexity items are sequenced behind them.

PhaseAreaThe move
1 · nowsettings / statusLine / hooksAmbient repo/branch/dirty statusline + generalize reconcile + daily branch-parking sweep + direct-to-main nudge
1 · nowscheduled routinesWire the 3 pre-written automations + disk-guard
1 · nowoutput-styles + skillsfounder-brief style + landed-check + which-repo skills
1.5hooksScope-closure pivot hook (UserPromptSubmit)
2hooksWiki-recall nudge (separate UserPromptSubmit)
2subagentsverifier · repo-reconciler · builder-lane + post-Task verify nudge
2workflowsReusable named-workflow library (your own realization)
2/goalOne trial on a known-good task shape
3delegationDecision-rule artifact + direct-Codex default
3agent-teams / PaperclipExplicitly gated, not built
1+skillsU11 Prompt-scaffold (adapt codex-prompt-builder) — highest AgentsView leverage
1.5hooks/skillsU12 Retry / tool stop-rule (genuinely fresh)
2skillsU13 Cost-guardrail checkpoints (adapt llm_billing)
2routinesU14 AgentsView weekly digest → PKM report
1hooksU15 PreCompact hard-backstop handoff hook (you asked for this)
1.5subagentsU16 Decision Contract + adversarial QA-verifier (decide-once)
1.5safetyU17 Kill-switch + freeze runbook + config-security-audit
2skillsU18 Agent-session-resume skill (port hacktivist123)
3skills-metaU19 Self-evolving-skills / instinct loop (ECC-style)
The right thing has to be automatic and ambient — because you have no internal "stop and check" mechanism when a new idea grabs you, and neither does the agent under context pressure.
Section 04

Implementation units

U1. Statusline + generalized reconcile + branch-parking sweep + direct-to-main nudgePhase 1

An ambient statusline (repo:branch (Ndirty ↑A↓B)) on every turn; generalize reconcile-prime.sh off the hardcoded aina path so it fires in any repo; a daily sweep for branches parked >24h; and a one-line nudge when you commit straight to main. Fixes the #1 pain and the live-verified gap.

U2. Wire the 3 pre-written automations + disk guardPhase 1

The prompts already exist in ~/.claude/automations/, labelled "not yet wired." Register aina-workspace-closer-audit first (it covers orphan work), verify one full run, then wire the other two. Add a cheap disk-guard — 2.0Gi free is a real risk.

U3. founder-brief output style + landed-check + which-repo skillsPhase 1

Make plain-English/decision-ready surfacing and the exact Landed/Not-landed closeout structural (loaded every session), and give the #1 failure a one-word check that runs the real git/GitHub verification instead of trusting an agent's self-report.

U4. Scope-closure pivot hookPhase 1.5

A UserPromptSubmit heuristic that notices a mid-session repo pivot with dirty prior work and injects a "close the prior thread first" nudge — advisory, never blocking. Catches the pivot when it happens, not at Stop (too late).

U5. Wiki-recall nudgePhase 2

A second, separate UserPromptSubmit heuristic (regex, no LLM) that reminds you to run pkm-agent think on new-domain work. Ships as its own commit — never built in the same session as U4.

U6. Personal subagents + post-Task verify nudgePhase 2

Formalize the parallel pattern you already run ad-hoc into named, model-pinned agents: verifier (adversarial, falsifies self-reports — the structural fix for the RunFusion repair-cascade), repo-reconciler (Haiku, isolated), builder-lane (Sonnet). Plus a nudge to verify a subagent's claims before they count.

U7. Reusable named-workflow libraryPhase 2

Your own realization — you never set up and ran workflows. A small library you fire yourself (research-a-topic, review-my-branch, mine-my-sessions), seeded from this plan's own cockpit-research script.

U8. /goal trial on a known-good shapePhase 2

One trial on a uniform+verifiable fan-out you've already proven (the aina-platform Fable pattern) — removes "keep going" re-prompting before you trust it for anything that matters.

U9. Delegation decision-rule artifact + direct-Codex defaultPhase 3

Write the Mac↔VDS↔Hermes↔Paperclip routing rule down as a durable, committed artifact; default heavy work to the proven direct-Codex lane; gate Paperclip on the 626 upgrade.

U10. Deferred experiments (gated, not built)Phase 3

Agent-teams and Paperclip-as-default stay explicitly gated with named unblock conditions, so neither silently starts and becomes an orphaned parallel pipeline.

U11. Prompt-scaffold skill — adapt codex-prompt-builderPhase 1+

Highest AgentsView leverage (prompt maturity = D). Make a minimum scaffold — Goal / Context / Constraints / Success-criteria / Verification — the default shape for prompts an agent runs, with a "too fuzzy → ask one question" escape hatch. Pair with a dr-gate so "done" is checkable, not asserted. Lift-ready from your own folder.

U12. Retry / tool stop-rulePhase 1.5

Genuinely fresh — nothing in the folder solves it. After N equivalent tool failures, force a stop-and-diagnose (failing command + error + next alternative + environment preflight) instead of looping. Directly targets the 8,305-retry bleed. Advisory, fail-open.

U13. Cost-guardrail checkpoints — adapt llm_billingPhase 2

Checkpoint long/expensive runs before they compound (the $1,813 session). A soft/hard turn-or-token threshold that forces an interim artifact + summary before continuing; a running cost hint on the statusline; /goal bound clauses to terminate runaways.

U14. AgentsView weekly digest routinePhase 2

Closes the sense→act→confirm loop. Weekly, read-only, no auth: agentsview stats --since 7d --format json → Sonnet narrates against the 6 review-kind headers → lands as a dated PKM report you can search. Generate the never-run instruction_opportunity_review manually once first.

U15. PreCompact hard-backstop handoff hookPhase 1

The session-handoff hook you asked about. Set CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=65 + a PreCompact hook that writes a ~30-line pointer handoff (resume command + paths, not 300 lines of pasted content) to ~/.claude/handoffs/ before compaction wipes context. Paired with a soft statusline nudge at 50%. A guarantee, not discipline — directly fixes the "compaction-sessions do worse" finding.

U16. Decision Contract + adversarial QA-verifierPhase 1.5

Your "decide once, no PR dance" made mechanical. Fill a contract at spec-time (CAN / must-NEVER / files-may / files-must-not / acceptance / risk), write the never-list first, then a QA-verifier subagent behaves like an adversarial user, attacks each must-NEVER, and is forbidden from saying "looks good." No human in the gate — you're pulled in only when reality breaks the contract.

U17. Kill-switch + freeze runbook + config-security-auditPhase 1.5

The "2am, an agent loops and opens 6 PRs" answer: one always-known command to freeze all autonomous activity in <60s, merges-default-OFF, minimum-scope git creds. Plus a one-time AgentShield-style audit of the cockpit's own settings/hooks/CLAUDE.md/MCP.

U18. Agent-session-resume skill — port hacktivist123Phase 2

Makes "continue where we left off" evidence-graded: DONE / PARTIALLY / NOT-DONE with file:line citations, read-to-end-of-transcript, git-status-first, and it preserves your "park this" deferrals instead of silently reopening them. Doubles as founder-readable status.

U19. Self-evolving-skills / instinct loopPhase 3 · deferred

Skills that improve from your corrections (ECC's continuous-learning-v2). Cheap start: log every correction as a candidate edit, review weekly. Only for objectively-checkable skills — never taste/voice. Proposals never auto-merge.

Because you review behavior, not shell, the actual authoring is delegated. Two representative prompts:

Codex · draft U1 statusline · write the script, do NOT touch the mature Stop hooks
Write ~/.claude/hooks/statusline.sh: in the current cwd, print
"repo:branch (Ndirty ↑ahead↓behind)" using git rev-parse --show-toplevel
(basename), git branch --show-current, git status --porcelain | wc -l,
and git rev-list --left-right --count HEAD...@{u}. Fail SILENT outside a
git repo (no output, exit 0). Then add a "statusLine" key to
~/.claude/settings.json pointing at it. Do not modify closeout-stop.sh
or durability-autopush.sh in this pass.
Watch out — scope creep into the mature close-out hooks. This pass is statusline only; the direct-to-main nudge is a separate, careful edit.
Codex · draft U3 landed-check skill · it must run the REAL verification, not narrate
Create ~/.claude/skills/landed-check/SKILL.md with frontmatter
disable-model-invocation: true. On invocation it runs git status
--porcelain, git log @{u}..HEAD, and gh pr view --json state,mergeable,
then reports the exact ✅ Landed / ❌ Not-landed plain-English line from
global CLAUDE.md. No raw /srv paths. It reports actual git/GitHub state
— it never infers "done" from an agent's prior claim.
Watch out — false-done. The skill's whole job is to defeat self-report inflation, so it must exercise the real git/gh readback, not summarize the conversation.
Section 05

Delegation recommendation

Default heavy AINA build to the already-proven aina-build-lane kernel (Claude architects and never writes implementation code; Codex-over-SSH is the workhorse; the VDS is the engine; report-file + watcher; commit-per-lane; independent verifier). This works now, with zero dependency on Paperclip.

Reserve full Paperclip-factory dispatch for genuinely multi-agent, cross-lane work, and treat it as blocked-preferred until 609→626 ships native hermes_local, native codex-login, and native watchdog. Until then, anything on Paperclip inherits 609's manual-glue fragility — the same shape of gap that produced both RunFusion repair-cascade failures (April's cross-workspace-isolation break; June's self-repair loop that discarded landed work via uncontrolled concurrency + autoMerge/autoUnpause).

Do not build in the gap
A Mac-native agent-teams layer would duplicate Paperclip's coordination and create a second orphaned pipeline — the exact thing the Sidework-Inbox rule exists to prevent. The native-Paperclip-vs-Hermes decision stays parked; this recommendation is a stopgap that works under either outcome.
Section 06

ADHD guardrails & risks

The guardrails are the anti-sprawl contract — dogfooding your own scope-closure discipline on the plan itself:

GuardrailWhy
Fail-open hooks onlyInject context / print a digest — never block on a human answer. Preserves your bypassPermissions posture.
Exactly 3 Phase-1 commitsU1 · U2 · U3 land separately — a cold session verifies each. "Landed not done" applied to the plan.
Never two UserPromptSubmit hooks in one sessionU4 and U5 ship apart so heuristics don't fight over the event.
"Never configured X" ≠ configure everythingSubagents / agent-teams are Phase 2/3, after the cheap wins.
Don't rebuild what worksOnly a narrow addendum touches the mature close-out hooks.
Delegate authoring, review behaviorCodex/Sonnet drafts; you validate the observable result.

Live risks to respect: disk at 2.0Gi free (do the disk-guard early); the pivot hook must stay advisory or it becomes the fatigue it prevents; verify one automation end-to-end before wiring all three; extend reconcile-prime.sh carefully (much-iterated); output-styles/subagents only load after /clear — early "not working" is usually an unreloaded session.

Section 07

Decisions only you can make

  1. Repo-root list for the sweep — proposed: PKM-monorepo, conductor/repos/*, Downloads/*ali*, aina-vds mirrors. Confirm/add.
  2. Digest time + channel — proposed: daily ~08:00 to your notify channel. Confirm channel.
  3. ✓ Resolved — founder-brief default for ALL sessions (Ali, 2026-07-01); U3 sets the global outputStyle with keep-coding-instructions.
  4. Paperclip upgrade re-eval — target date, or purely availability-gated?
  5. Phase-2 subagents global or per-repo? — global over-triggers; per-repo re-authors.
  6. Unpark native-Paperclip-vs-Hermes now, or keep the Phase-3 stopgap?
  7. Generate the never-run instruction_opportunity_review manually first? — to see its shape before wiring the weekly loop (recommended: yes).
  8. Store weekly digests back into AgentsView, or keep PKM-only? — storing-back needs the VDS bearer token (recommended: PKM-only; store-back is a separate opt-in).
  9. Compaction thresholds — the 50% soft-nudge / 65% hard-backstop hybrid, or different numbers?
  10. Adopt ECC selectively, or lift patterns only? — install the plugin (instinct loop, AgentShield) vs hand-port the patterns (recommended: lift first, evaluate ECC install as a Phase-3 experiment).
  11. Enable Claude Code's native memories to match Codex? — Codex has it on; the cockpit could too.
Where to start

Three commits this week — an ambient statusline, three wired routines, and a founder-brief that ends every session in one plain-English line. You're one good afternoon from a cockpit that flies.