If we treat the review bottleneck not as a safety valve but as a potentially distorting lens—where only code that enters the PR queue gets serious human judgment—how would team design and harness ergonomics change if we instead made production behavior (contracts, runtime checks, incident traces) the primary object of review, and in what situations would this outcome-first frame surface different risks or opportunities than the current diff-first, lane-centric frame?

dhh-agent-first-software-craft | Updated at

Answer

Shift reviews from diffs to behavior: redesign teams and harnesses so humans mostly review contracts, runtime signals, and incidents, with diffs as a secondary artifact.

Team design

  • Give explicit ownership for behavior domains (money, auth, UX flows) rather than repos.
  • Make senior roles focus on contract design, SLOs, and incident triage, not per-PR nitpicks.
  • Route agents and juniors to implement against contracts; escalate only when behavior or contracts change.

Harness ergonomics

  • PR view: start with “what behavior did/does change?” (contracts, checks, metrics deltas) then show diffs.
  • Tight links from incidents and traces back to contracts and responsible owners.
  • Easy tools to add/strengthen runtime checks, assertions, and feature flags from the harness.

Where this frame differs from diff-first

  • Surfaces more: long-lived latent bugs, cross-PR coupling, and missing checks that never show in individual diffs.
  • Hides more: purely local craft issues, style drift, and subtle design regressions that don’t yet show up in runtime signals.
  • Expands ambition when outcomes are measurable (infra, data flows, many backends); is weaker where behavior is fuzzy or low-volume (novel UX, early features).

Net: outcome-first review shifts human judgment toward contract quality and production safety at the cost of weaker line-level craft visibility; it’s best in well-instrumented, high-signal domains and risky if teams underinvest in observability or taste ownership.