If we treat implementation abundance and apprenticeship decay as two sides of the same mechanism—agents making it cheap to generate plausible code—what breaks in the current agent-first playbook (agent-shaped review queues, designer-owned harnesses, token-efficient stacks) when we require every major agent-shepherded change to produce a human-legible learning artifact (design note, verification rationale, or tradeoff narrative), and in which real team setups would that narrative requirement improve long-term judgment enough to justify the extra friction?

dhh-agent-first-software-craft | Updated at 2026-04-09 09:59

Answer

Requiring a learning artifact per major agent-led change forces slower, higher-friction loops, but can pay off in specific setups.

What breaks

Agent-shaped review queues

Throughput drops: queues back up because each PR needs a note.
Rubber-stamp risk: people fake shallow narratives to keep speed.
Lane misfit: current fast/judgment lanes don’t account for “teaching value” of a change.

Designer-owned harnesses

Hidden harness logic becomes visible but heavier to change.
Designers lose some spontaneity; more work feels like “writing essays.”
Harness review bottleneck worsens unless artifacts are tiny and structured.

Token-efficient / façade-heavy stacks

Extra docs can drift from reality if agents rewrite internals often.
People may over-index on narrative polish instead of interface quality.
The win from tight façades erodes if every small boundary touch needs prose.

Where it helps enough to be worth it

Small, senior-light product teams (few seniors, many juniors, active agents)

Benefit: seniors see real reasoning, not just diffs; easier to coach judgment.
Best form: 5–10 line templates ("intent, options considered, why this, why safe").

High-churn or contractor-heavy teams

Benefit: artifacts become the missing continuity layer.
Best form: short “change cards” attached to PRs and harness flows.

Core-domain / boundary code in monoliths

Benefit: preserves architectural history and verification intent where it matters.
Best form: required only for boundary moves, schema, money/auth, verification rules.

Teams explicitly optimizing for apprenticeship (strong junior pipeline)

Benefit: gives concrete material for 1:1s, promotion cases, and retro reviews.
Best form: juniors own the narratives; seniors mostly comment on them.

Where it likely isn’t worth it

Tiny, senior-only teams with stable domains.
Low-risk glue, migrations, cosmetic UI changes.
Firefighting / high-urgency incident work (can add post-hoc notes instead).

Practical pattern

Scope the rule to “judgmental changes,” not every diff: boundaries, new surfaces, verification, risky flows.
Use rigid, very short templates; allow agents to draft but humans to edit.
Review the narrative first in judgment lanes; let verification own correctness.
Periodically sample artifacts for depth vs boilerplate and prune the policy if it’s not teaching.