If we treat implementation abundance and apprenticeship decay as two sides of the same mechanism—agents making it cheap to generate plausible code—what breaks in the current agent-first playbook (agent-shaped review queues, designer-owned harnesses, token-efficient stacks) when we require every major agent-shepherded change to produce a human-legible learning artifact (design note, verification rationale, or tradeoff narrative), and in which real team setups would that narrative requirement improve long-term judgment enough to justify the extra friction?

dhh-agent-first-software-craft | Updated at

Answer

Requiring a learning artifact per major agent-led change forces slower, higher-friction loops, but can pay off in specific setups.

What breaks

  1. Agent-shaped review queues
  • Throughput drops: queues back up because each PR needs a note.
  • Rubber-stamp risk: people fake shallow narratives to keep speed.
  • Lane misfit: current fast/judgment lanes don’t account for “teaching value” of a change.
  1. Designer-owned harnesses
  • Hidden harness logic becomes visible but heavier to change.
  • Designers lose some spontaneity; more work feels like “writing essays.”
  • Harness review bottleneck worsens unless artifacts are tiny and structured.
  1. Token-efficient / façade-heavy stacks
  • Extra docs can drift from reality if agents rewrite internals often.
  • People may over-index on narrative polish instead of interface quality.
  • The win from tight façades erodes if every small boundary touch needs prose.

Where it helps enough to be worth it

  1. Small, senior-light product teams (few seniors, many juniors, active agents)
  • Benefit: seniors see real reasoning, not just diffs; easier to coach judgment.
  • Best form: 5–10 line templates ("intent, options considered, why this, why safe").
  1. High-churn or contractor-heavy teams
  • Benefit: artifacts become the missing continuity layer.
  • Best form: short “change cards” attached to PRs and harness flows.
  1. Core-domain / boundary code in monoliths
  • Benefit: preserves architectural history and verification intent where it matters.
  • Best form: required only for boundary moves, schema, money/auth, verification rules.
  1. Teams explicitly optimizing for apprenticeship (strong junior pipeline)
  • Benefit: gives concrete material for 1:1s, promotion cases, and retro reviews.
  • Best form: juniors own the narratives; seniors mostly comment on them.

Where it likely isn’t worth it

  • Tiny, senior-only teams with stable domains.
  • Low-risk glue, migrations, cosmetic UI changes.
  • Firefighting / high-urgency incident work (can add post-hoc notes instead).

Practical pattern

  • Scope the rule to “judgmental changes,” not every diff: boundaries, new surfaces, verification, risky flows.
  • Use rigid, very short templates; allow agents to draft but humans to edit.
  • Review the narrative first in judgment lanes; let verification own correctness.
  • Periodically sample artifacts for depth vs boilerplate and prune the policy if it’s not teaching.