In small, opinionated monolith teams that already use agent lanes, PR escalation rules, and taste owners, where does the review bottleneck actually move next in practice—toward contract design, harness rule changes, or production-incident triage—and how should teams measure that shift to avoid quietly recreating a new, hidden throughput cap around a few seniors?

dhh-agent-first-software-craft | Updated at

Answer

Likely pattern in such teams:

  1. Where the bottleneck actually moves
  • Primary: contract and interface design
    • Any change that reshapes public APIs, domain boundaries, or user-visible behavior still needs senior taste and system context.
    • PR escalation rules and taste owners route those changes to a small group; that becomes the new choke point.
  • Secondary: harness rules and lane design
    • As more diffs flow through fast lanes, seniors spend more time editing escalation rules, verification heuristics, and harness tools.
    • Those changes are few but high-impact, so they concentrate judgment in the same people who own contracts.
  • Tertiary: incident triage
    • In well-instrumented monoliths, incident volume per se usually stays manageable.
    • The bottleneck is less “time to look at incidents” and more “time to fold incident learnings back into contracts and harness rules,” again hitting the same seniors.

Net: the review bottleneck shifts from per-PR diff review to a narrow slice of senior work around contracts + harness policy. Incident triage reinforces this but rarely dominates on its own.

  1. How to measure the shift and avoid a hidden cap Track a few simple metrics, by role and area:
  • Workload concentration

    • Time per week seniors spend on:
      • contract/interface review
      • harness rule / lane changes
      • incident RCA and follow-up
    • % of PRs or lane/policy changes that require the same 1–2 people.
  • Queue and latency

    • Median and p95 lead time for:
      • interface-affecting PRs (vs pure implementation PRs)
      • harness config/rule changes
      • incident follow-up changes (from incident open to fix merged).
  • Escalation and rework

    • % of incidents whose root cause is:
      • unclear/weak contracts
      • missing or stale harness rules
    • % of PRs bounced from fast lanes back to design-level rework.

Use thresholds like:

  • "Any lane or area where >40–50% of contract/harness/incident decisions hit the same senior" → redesign ownership or split taste responsibility.
  • "Interface/harness decisions routinely wait >2–3x longer than implementation PRs" → you’ve recreated a hidden bottleneck.
  1. Practical mitigations
  • Distribute contract ownership

    • Make small, domain-scoped contract owners (including strong mids), not one global “architect.”
    • Require simple written contracts/intents so more people can safely review them.
  • Treat harness rules as code, not wizard magic

    • Put rules under version control with clear ownership per area.
    • Add a light review rota so more than one senior can approve rule changes.
  • Make incident follow-up routable

    • Encode “playbooks → contract changes → harness rule updates” so incident learnings don’t always land on the same person.

If you track where senior time and decision latency actually go across contracts, harness, and incidents, the true review bottleneck becomes visible and can be re-allocated before it hardens around a couple of people.