In small, implementation-abundant monolith teams that already use drift metrics, change-intent lanes, and bug-class verification, what concrete signs in day-to-day work (e.g., PR rework patterns, incident postmortems, standup discussions) show that the review bottleneck has shifted from diff-level correctness to higher-order judgment—such as problem framing, boundary reshaping, or pattern blessing—and how could harnesses be tuned to surface exactly those higher-order decisions earlier in the loop instead of at final review time?
dhh-agent-first-software-craft | Updated at
Answer
Signs and harness tweaks, in compact form.
Concrete day-to-day signs the bottleneck is now higher-order
- PR / review patterns
- Many "scope / framing" comments, few bug comments
- Reviews say: "wrong problem", "collapse these flows", "this belongs in X boundary", vs "null check" or "off-by-one".
- Late rewrites of otherwise-correct diffs
- Agents produce merge-worthy code, but seniors ask for major reshapes (move feature to another module, change API, delete and re-scope).
- Repeated pattern or boundary debates on similar PRs
- Same arguments about where logic lives or which helper/pattern to use; correctness rarely contested.
- High re-open / follow-up PRs for design, not bugs
- Post-merge follow-ups mostly rename, extract, move, or unify patterns; few fix functional regressions.
- Incident and postmortem patterns
- Incidents from "wrong behavior" or coupling, not simple mistakes
- Root causes: wrong boundary, missing domain rule, confused ownership, over-coupled flows, not syntax or trivial race.
- Fix is conceptual, not local
- Postmortems describe: new abstraction, boundary move, new policy in harness; not "add test" or "fix index" only.
- Same conceptual issue appears across areas
- Several incidents all trace back to an unclear contract, pattern, or boundary that review didn’t force earlier.
- Standups / planning / ritual patterns
- Standups dominated by "what are we really solving?" questions
- Time goes to reframing tickets, merging or splitting work, redefining acceptance; very little on "can we build it".
- PRs called out for "needs taste/arch review" even when tests are green
- Teams route work based on judgment needs, not correctness risk.
- Design docs lag behind PRs
- Review conversations keep re-doing design that never got pinned before coding.
Harness tuning to surface higher-order decisions earlier
- Make judgment-heavy work explicit via lanes
- Add or refine lanes:
boundary_reshape,new_pattern,contract_change,scope_question.
- Auto-suggest lanes from diffs:
- Many files across boundaries → suggest
boundary_reshape. - New core helper/service/DSL → suggest
new_pattern. - Changes to public APIs/events → suggest
contract_change.
- Many files across boundaries → suggest
- Require a tiny decision stub in these lanes:
- 3 bullets: "What changed", "Options considered", "Why this".
- Add judgment cards before full diff review
- For flagged lanes, have the harness/agent emit a short card before human review:
- Boundary card: "Here’s the current vs proposed call graph and ownership."
- Pattern card: "This new helper overlaps with A/B; options: bless, localize, reject."
- Contract card: "Downstream callers; potential breakages; migration sketch."
- Show these cards in PR header or as a pre-review CLI step so seniors decide structure before nits.
- Pull higher-order questions into standup / kickoff
- For PRs / tickets in these lanes, auto-generate prompts:
- "What boundary owns this?", "Should this be a shared pattern?", "What’s the minimal contract change?"
- Surface them on standup boards:
- Column for "Open boundary/pattern calls" with owner and due time.
- Use metrics to confirm the bottleneck shift
- Track review-tag ratios:
- Tag comments as
correctness,style,boundary,pattern,scope; rising share of the latter is your quantitative signal.
- Tag comments as
- Track rework reasons on PRs:
- Small template on merge: main changes after first review? (
bugs,naming,boundary,pattern,scope).
- Small template on merge: main changes after first review? (
- Feed these back into harness defaults:
- If many PRs in area X get
boundaryrework, auto-escalate new PRs there toboundary_reshapelane with required decision stub.
- If many PRs in area X get
- Let harness gate on unresolved high-level decisions, not just tests
- For judgment lanes, block merge unless:
- Required decision stub is filled.
- A reviewer explicitly toggles one of:
boundary_approved,pattern_blessed,local_violation_ok.
- Keep the implementation checks light in these lanes so attention goes to the decision, not more correctness noise.
Net effect
- When these signs appear and you encode them in lanes, cards, and simple gates, the review bottleneck visibly moves from diff correctness to a thinner stream of deliberate boundary/pattern/scope calls made earlier in the loop.