When small, high-context teams lean on agents for most draft implementation inside a token-efficient monolith, which specific practices (e.g., contract-level reviews, periodic architecture-drift bundles, capability-tiered agent access) most reliably keep taste and interface quality improving over time instead of slowly regressing toward the agent’s default, and how could you measure that drift in day-to-day PRs?

dhh-agent-first-software-craft | Updated at

Answer

Use a small set of explicit taste-focused practices plus lightweight PR metrics. Keep humans owning contracts and interfaces; let agents fill in implementation.

Core practices

  1. Contract-level review as the primary gate
  • Review "what this interface/behavior should be" before debating the diff.
  • Require a short contract or intent block on PRs that add/reshape interfaces.
  • Make outcome checks (tests, scenarios) part of that contract.
  1. Taste owners for key surfaces
  • Assign a small set of senior "taste owners" for core modules and UX surfaces.
  • Route any interface change in their area to them or their lane.
  • Rotate juniors through pairing with them on representative PRs.
  1. Periodic architecture/taste bundles
  • Batch small, related agent PRs by area into a periodic "drift bundle" for holistic review.
  • Use bundles to rename, re-factor, and re-align APIs, not just fix bugs.
  1. Capability-tiered agent access
  • Allow agents to freely edit implementations; restrict interface creation/changes to higher-trust workflows (explicit approvals, richer prompts, more tests).
  • Keep some core boundaries manual-first for teaching and taste.
  1. Style and interface exemplars
  • Maintain a small set of exemplar files for each area that show "how we want this to look".
  • Point agents and humans at exemplars in prompts and review comments.

Measuring drift in day-to-day PRs

  1. Simple PR-level drift signals
  • Interface churn: rate of new/changed public methods, endpoints, events per area.
  • Name and shape inconsistency: new APIs that diverge from local exemplars (measured via light linting/heuristics or agent checks).
  • Test/contract alignment: changes to interfaces without matching updates to tests/docs/contracts.
  1. Taste review mix
  • Ratio of PRs where reviewers leave interface/taste comments vs only correctness comments.
  • Share of interface PRs seen by a taste owner vs general fast-lane review.
  1. Periodic sampling
  • Randomly sample a few recent PRs per area; have a senior rate them on 1–2 simple dimensions: "fits local style/taste?" and "interface clearer or messier than before?".
  • Track scores over time; trigger drift bundles or rule tweaks when scores fall.

Together, these keep interface decisions human-led, make agent defaults visible in metrics, and provide concrete points to intervene before taste regresses.