In small, implementation-abundant monolith teams that already use drift metrics, change-intent lanes, and bug-class verification, what concrete signs in day-to-day work (e.g., PR rework patterns, incident postmortems, standup discussions) show that the review bottleneck has shifted from diff-level correctness to higher-order judgment—such as problem framing, boundary reshaping, or pattern blessing—and how could harnesses be tuned to surface exactly those higher-order decisions earlier in the loop instead of at final review time?

dhh-agent-first-software-craft | Updated at

Answer

Signs and harness tweaks, in compact form.

Concrete day-to-day signs the bottleneck is now higher-order

  1. PR / review patterns
  • Many "scope / framing" comments, few bug comments
    • Reviews say: "wrong problem", "collapse these flows", "this belongs in X boundary", vs "null check" or "off-by-one".
  • Late rewrites of otherwise-correct diffs
    • Agents produce merge-worthy code, but seniors ask for major reshapes (move feature to another module, change API, delete and re-scope).
  • Repeated pattern or boundary debates on similar PRs
    • Same arguments about where logic lives or which helper/pattern to use; correctness rarely contested.
  • High re-open / follow-up PRs for design, not bugs
    • Post-merge follow-ups mostly rename, extract, move, or unify patterns; few fix functional regressions.
  1. Incident and postmortem patterns
  • Incidents from "wrong behavior" or coupling, not simple mistakes
    • Root causes: wrong boundary, missing domain rule, confused ownership, over-coupled flows, not syntax or trivial race.
  • Fix is conceptual, not local
    • Postmortems describe: new abstraction, boundary move, new policy in harness; not "add test" or "fix index" only.
  • Same conceptual issue appears across areas
    • Several incidents all trace back to an unclear contract, pattern, or boundary that review didn’t force earlier.
  1. Standups / planning / ritual patterns
  • Standups dominated by "what are we really solving?" questions
    • Time goes to reframing tickets, merging or splitting work, redefining acceptance; very little on "can we build it".
  • PRs called out for "needs taste/arch review" even when tests are green
    • Teams route work based on judgment needs, not correctness risk.
  • Design docs lag behind PRs
    • Review conversations keep re-doing design that never got pinned before coding.

Harness tuning to surface higher-order decisions earlier

  1. Make judgment-heavy work explicit via lanes
  • Add or refine lanes:
    • boundary_reshape, new_pattern, contract_change, scope_question.
  • Auto-suggest lanes from diffs:
    • Many files across boundaries → suggest boundary_reshape.
    • New core helper/service/DSL → suggest new_pattern.
    • Changes to public APIs/events → suggest contract_change.
  • Require a tiny decision stub in these lanes:
    • 3 bullets: "What changed", "Options considered", "Why this".
  1. Add judgment cards before full diff review
  • For flagged lanes, have the harness/agent emit a short card before human review:
    • Boundary card: "Here’s the current vs proposed call graph and ownership."
    • Pattern card: "This new helper overlaps with A/B; options: bless, localize, reject."
    • Contract card: "Downstream callers; potential breakages; migration sketch."
  • Show these cards in PR header or as a pre-review CLI step so seniors decide structure before nits.
  1. Pull higher-order questions into standup / kickoff
  • For PRs / tickets in these lanes, auto-generate prompts:
    • "What boundary owns this?", "Should this be a shared pattern?", "What’s the minimal contract change?"
  • Surface them on standup boards:
    • Column for "Open boundary/pattern calls" with owner and due time.
  1. Use metrics to confirm the bottleneck shift
  • Track review-tag ratios:
    • Tag comments as correctness, style, boundary, pattern, scope; rising share of the latter is your quantitative signal.
  • Track rework reasons on PRs:
    • Small template on merge: main changes after first review? (bugs, naming, boundary, pattern, scope).
  • Feed these back into harness defaults:
    • If many PRs in area X get boundary rework, auto-escalate new PRs there to boundary_reshape lane with required decision stub.
  1. Let harness gate on unresolved high-level decisions, not just tests
  • For judgment lanes, block merge unless:
    • Required decision stub is filled.
    • A reviewer explicitly toggles one of: boundary_approved, pattern_blessed, local_violation_ok.
  • Keep the implementation checks light in these lanes so attention goes to the decision, not more correctness noise.

Net effect

  • When these signs appear and you encode them in lanes, cards, and simple gates, the review bottleneck visibly moves from diff correctness to a thinner stream of deliberate boundary/pattern/scope calls made earlier in the loop.