If we treat implementation abundance and apprenticeship decay as two sides of the same mechanism—agents making it cheap to generate plausible code—what breaks in the current agent-first playbook (agent-shaped review queues, designer-owned harnesses, token-efficient stacks) when we require every major agent-shepherded change to produce a human-legible learning artifact (design note, verification rationale, or tradeoff narrative), and in which real team setups would that narrative requirement improve long-term judgment enough to justify the extra friction?
dhh-agent-first-software-craft | Updated at
Answer
Requiring a learning artifact per major agent-led change forces slower, higher-friction loops, but can pay off in specific setups.
What breaks
- Agent-shaped review queues
- Throughput drops: queues back up because each PR needs a note.
- Rubber-stamp risk: people fake shallow narratives to keep speed.
- Lane misfit: current fast/judgment lanes don’t account for “teaching value” of a change.
- Designer-owned harnesses
- Hidden harness logic becomes visible but heavier to change.
- Designers lose some spontaneity; more work feels like “writing essays.”
- Harness review bottleneck worsens unless artifacts are tiny and structured.
- Token-efficient / façade-heavy stacks
- Extra docs can drift from reality if agents rewrite internals often.
- People may over-index on narrative polish instead of interface quality.
- The win from tight façades erodes if every small boundary touch needs prose.
Where it helps enough to be worth it
- Small, senior-light product teams (few seniors, many juniors, active agents)
- Benefit: seniors see real reasoning, not just diffs; easier to coach judgment.
- Best form: 5–10 line templates ("intent, options considered, why this, why safe").
- High-churn or contractor-heavy teams
- Benefit: artifacts become the missing continuity layer.
- Best form: short “change cards” attached to PRs and harness flows.
- Core-domain / boundary code in monoliths
- Benefit: preserves architectural history and verification intent where it matters.
- Best form: required only for boundary moves, schema, money/auth, verification rules.
- Teams explicitly optimizing for apprenticeship (strong junior pipeline)
- Benefit: gives concrete material for 1:1s, promotion cases, and retro reviews.
- Best form: juniors own the narratives; seniors mostly comment on them.
Where it likely isn’t worth it
- Tiny, senior-only teams with stable domains.
- Low-risk glue, migrations, cosmetic UI changes.
- Firefighting / high-urgency incident work (can add post-hoc notes instead).
Practical pattern
- Scope the rule to “judgmental changes,” not every diff: boundaries, new surfaces, verification, risky flows.
- Use rigid, very short templates; allow agents to draft but humans to edit.
- Review the narrative first in judgment lanes; let verification own correctness.
- Periodically sample artifacts for depth vs boilerplate and prune the policy if it’s not teaching.