AINA · Agent Org & Cockpit Ali Mukadam ·· Claude 2026-07-01

Plans Review — Gaps & Optimizations

The Minas Tirith operating model, the cockpit plan, and the build sheet — re-read through one lens: Ali is away doing marketing, and the machine must keep shipping without him.

Review of 3 plan documents + the full decision thread · ~8 min read · anchor: AIOPS-266 · independent adversarial pass concurred

The Single Idea

If Ali disappears into marketing for two weeks, does the machine keep shipping — and if it quietly stops, does anyone find out? The plans are strong on transition mechanics but none of them treats "Ali becomes unreachable" as the condition that should reorder their own priorities. Five gaps could stop the machine silently; all are cheap to close before departure.

What the plans were written for

Ali at the laptop, opening Claude Code daily, reviewing each phase, noticing when a dashboard goes red, cleaning a full disk, un-sticking a merge queue by hand.

What they must now survive

Ali on the road for weeks. A phone browser and Slack. Fifteen pulled minutes a day. Every failure that needs "someone will notice" must notice itself.

01Verdict 02Tier A — silent-stop gaps 03Tier B — unowned work 04Tier C — optimizations 05The flip checklist 06What's already solid

Section 01

Verdict

The three plans are strong, and the last 24 hours fixed their two riskiest parts — Donna's Jessica-stays decision removed the org-surgery steps, and her board-map closed the orphaned-work problem. But all three were written for "Ali supervises from the laptop," not "Ali is gone." Read through the away-mode lens, there are 5 gaps that could silently stop the machine, 4 pieces of work with no owner or no rule, and 5 optimizations — none hard, all cheap relative to what they protect.

An independent adversarial review pass (separate agent, same evidence) reached the same top four gaps unprompted and added two of the findings below. Its verdict matches this one: none of the three docs treats "Ali becomes unreachable" as the trigger that should reorder their own priorities.

The two biggest findings: the whole cross-model design rests on Claude actually running on the VDS — never proven there — and every safety mechanism in the org is downstream of Donna's one process staying alive, which nothing currently watches. Everything else is sequencing.

Every safety mechanism in the org sits downstream of one unwatched process. Under a "no pings" policy, its silent death looks identical to a healthy quiet week.

Section 02

Tier A — could silently stop the machine while you're away

A1Claude-on-VDS is unproven, and every team's design depends on it

Your directive: Sonnet 5 leads for curriculum + marketing, and a Claude↔Codex verifier pair in every team. But the June handover explicitly ran with no Claude tokens ("Codex/gpt-5.5 builds"), Hermes's Claude-subscription auth is broken upstream, and headless Claude auth has failed before. If the org flips on and the Claude seats silently can't authenticate, every content lane and every verification gate degrades — and per the no-pings posture, nobody tells you.

FixAdd a Claude-auth canary to the flip checklist next to the Galadriel canary: one Sonnet-5 agent and one Claude verifier must each complete a real task on the VDS before any routing depends on them. And decide the fallback now (if Claude can't run: verifiers pair as two differently-configured Codex agents until fixed) so it's a degradation, not a stall.

A2Nobody watches the watcher — Donna is a single point of failure with a "no pings" posture

Donna's gateway is one process on the VDS. If it dies (or the VDS reboots), the entire executive layer stops — routing, verification, escalation, the digest — and the no-pings rule means silence looks identical to health. Two weeks of quiet could be two weeks of nothing.

Fix — three small piecesAuto-restart on her gateway + dashboard services; a heartbeat timestamp at the top of the pull surface ("org last alive: N minutes ago"); and one standing exception to no-pings — a dead-man alert that pings your Slack only when the heartbeat goes stale. That's infrastructure, not agent scaffolding: it's the one failure agents cannot self-report.

Under a no-pings policy, silence looks identical to health — the dashboard must prove the org is alive, not just quiet.

A3The pull surface is the keystone, and it's still unbuilt

Every directive routes through it — Needs-Ali, idle teams, escalations, spend, hard-stops, preview URLs. Away from the laptop, the dashboard IS your job interface, and it must be mobile-first. It's spec'd (AIOPS-216) but not built; meanwhile Donna already proved the pattern by publishing her board-map to a mobile review page.

FixMake Mission-Control v1 the first build after the canary — before routines flip on (an org that escalates to a surface that doesn't exist is escalating to /dev/null). Add your side of the contract: a stated cadence — you check it ~15 min daily; anything sitting in Needs-Ali longer than 48h earns one Slack ping.

A4The merge path is the historic stall point, and Frodo's authority is unverified

The June board stall was a merge deadlock — conflict-stacked PRs, branch protection needing admin, no merge train. Frodo now owns merge/PR, but his actual GitHub permissions, Mergify path, and branch-protection compatibility have never been exercised. If Frodo can't actually merge, teams keep "finishing" work that never lands — false-done at org scale, precisely while you're not looking.

FixMake "Frodo merges one real PR end-to-end, with preview URL" a flip-checklist gate. The perfect test already exists: PR #121 (the /lesson chat-first shell, ACAD-137) is open and dirty against main.

A5The org's memory supply chain runs through your Mac

The nightly PKM pipeline — the memory the whole "search PKM/docs first before pinging Ali" policy leans on — runs only when your Mac is awake, and the disk is at 86% with ~2GB free. If the Mac fills up or sleeps for a week while you travel, the org's context quietly goes stale; retrieval failures then turn into either bad guesses or Ali-pings, the two things the policy exists to prevent.

FixClean the disk now (the cockpit plan's disk-guard, U2, jumps to first); surface PKM freshness on the pull dashboard (brain_health already computes this — wire it in); and treat staleness as visible-degradation, not silent-normal.

Section 03

Tier B — real work that currently has no owner

B1The COO agent's disposal became unowned when Jessica-stays landed

The live COO agent (a 10th top-level agent dispatching build lanes via a file-marker — the exact external scaffolding you want gone) was going to be retired as part of "retire Jessica." That step was cancelled, but the COO wasn't — it's still live, still dispatching, and now nobody owns removing it. A file-marker dispatcher racing Jessica's router is the same uncontrolled-concurrency shape that caused the June repair-cascade failure.

FixExplicit owner. Fold COO's dispatch loop into Jessica's internal routing (she's the router now), keep the COO-skepticism ("movement ≠ progress") with Donna, retire the agent + file-marker. One issue, one owner.

B2The heads' memory/learning loops are described, not owned

The operating model's best section (per-head MEMORY.md, weekly distill, FTS5 recall, LLM-Wiki for domain heads) is a real build — the memory dirs are empty today — and it's gated on three decisions still sitting with you from the agent-memory study: go/no-go on wiring PKM reach, Gandalf-vs-Finch as owner, and which memory tiers to build.

FixDecide the three now (recommendation: go; Finch owns it as Loremaster; build Tier-1 bootstrap + weekly distill first, the fancy tiers later) and let Donna board-map the build. At minimum, the 2–3 heads whose routines get test-fired need a working MEMORY.md with one verified read-after-write round-trip each — not just "the directory exists." Without this, "search first before pinging Ali" has nothing to search, and small inefficiencies compound instead of distilling away.

B3"No endless loops, no stalls" is a directive without a mechanism

Your guardrail replaced budget caps — but the mechanics aren't attached yet: 0 of 7 routines fire, no task-watchdogs are attached, and the two cheap known gaps from June 29 (per-run iteration caps, trigger sweep) are still open.

FixEncode it as three concrete things: max-iteration caps on every routine, native task-watchdogs on the ~8 lane roots (Guard-tier reviewers, never the lane's own head), and a Donna stall-review routine (anything unchanged >24h → pull surface).

B4Cross-department work has no reviewer rule (found by the adversarial pass)

The Guard-tier map is static — Gimli/Éowyn cover Engineering, Sauron's team covers Security, and so on. But real lanes cross departments (an engineering change touching auth code is the classic case). Today it's undefined which watchdog fires when a lane's actual blast radius doesn't match the department it was filed under — maybe both, maybe neither.

FixOne fallback rule: a lane touching paths outside its department's declared ownership escalates to Donna for reviewer reassignment — never silently defaults to the original department's reviewer.

Section 04

Tier C — optimizations (make good plans better)

C1Re-rank cockpit Phase 1 for away-mode — and let it lose the fight for your attention

The current cut (statusline, PreCompact, prompt-scaffold…) assumes you're in terminal sessions daily; the build sheet even gates on "say go and review behavior." Both plans compete for the same scarce resource — your pre-departure attention — and the org-safety items must win: your last hands-on sessions go to the two canaries, the dead-man switch, the pull surface, and Frodo's first real merge — not cockpit polish. Within the cockpit itself, U2 (automations + disk-guard) and U14 (AgentsView weekly digest) jump up — they protect the unattended Mac and reach you where you are; statusline and prompt-scaffold only pay off when you're in a session. Keep U15 (PreCompact backstop) high. And the plan's 11 open decisions all have sane recommended defaults — adopt them wholesale (all reversible config) instead of leaving a decision queue on the person who's leaving.

C2Rewrite the Minas Tirith doc as a slim v2

It's the one document a cold agent would read and get wrong: the manifest-corruption landmine turned out false, three sections still say "retire Jessica / new Donna record," Security says Théoden not Sauron, and Donna's budget-policy ask is superseded by no-budget. The execution plan shrinks from 11 steps to ~6. Worth an hour before anyone treats it as reference.

C3Consolidate the charter into one document

Your directives live across ~8 Linear comments plus four charter docs. For an org whose prime rule is "search before asking," the CEO charter should be ONE searchable document — which is exactly what Donna asked for. Finch compiles; comments become history.

C4A daily credentials-and-quota preflight

Tokens and quotas expire on their own schedule (Claude's interactive-only auth, Cloudflare, GitHub, the Vertex 429 history). One cheap daily routine checks each critical credential and surfaces days-to-expiry into the pull surface's Hard-stop lane — so an expired token reads as "known, queued" instead of a mystery stall three lanes deep.

C5Wire the marketing lanes' review loop first

You'll be working with growth/marketing (the Sonnet-5 leads) daily — their preview-URL proof loop doubles as the pilot for your review cadence, on the lanes where your taste matters most. Start the Ali-review habit where you already are.

Section 05

The flip checklist — critical path to "runs two weeks without Ali"

Config table — the 62-agent mapping in Donna's format (AIOPS-263) → Donna reviews.

Two canaries — Galadriel on Hermes (memory + recall + delegation) and the Claude-auth canary.

COO disposal — fold into Jessica's routing; Donna keeps the skepticism.

Routines + guardrails — test-fire 2–3 routines; watchdogs on lane roots; iteration caps.

Pull surface v1 live — mobile-first, with the heartbeat + dead-man ping.

Frodo merges one real PR — PR #121 / ACAD-137, preview URL attached.

Memory bootstrap — Tier-1 MEMORY.md per head + weekly distill; Finch owns.

You step back — daily 15-min pull review; 48h Needs-Ali escalation rule.

Everything on this list is reversible except #3 — and even that is receipts-first.

The departure rule — the adversarial pass's sharpest addition If your marketing shift arrives before this checklist completes, the default is do NOT flip the remaining heads under time pressure. A half-flipped org — one canary live, seven heads unproven — is worse than a parked one: it's neither safe to leave alone nor staffed for real autonomy. Anything time-critical stays on the already-proven direct-Codex build lane until the checklist is genuinely green.

Section 06

What's already solid (verified — don't reopen)

Donna's Jessica-stays boundary removed the two riskiest steps (org surgery + a false-done CEO record) — better than our proposal, and it held up under this review.
The board-map closed the orphan problem: 6 issues updated, 4 created, the /lesson chat-first direction now owned (ACAD-137), nothing from the discussion thread left untracked.
The placement rubric (hook = guarantee, skill = default, subagent = isolation never tidiness, Paperclip = persistent) survived adversarial reading unchanged — the keeper artifact of the cockpit work.
Repair → verify → flip sequencing and the ship-ranked "build 5, park the rest" discipline are the right shape; this review only re-orders what's inside them.
Skills-canon delegation (Donna + heads pick, Finch merges, trim to role) removed a founder bottleneck cleanly.

Where to start

Four things must be true before you go dark: the dead-man switch, the Claude-auth canary, Frodo's first real merge, and a pull surface that exists — everything else can land while you're away.