When small, high-context teams lean on agents for most draft implementation inside a token-efficient monolith, which specific practices (e.g., contract-level reviews, periodic architecture-drift bundles, capability-tiered agent access) most reliably keep taste and interface quality improving over time instead of slowly regressing toward the agent’s default, and how could you measure that drift in day-to-day PRs?

dhh-agent-first-software-craft | Updated at 2026-04-09 10:01

Answer

Use a small set of explicit taste-focused practices plus lightweight PR metrics. Keep humans owning contracts and interfaces; let agents fill in implementation.

Core practices

Contract-level review as the primary gate

Review "what this interface/behavior should be" before debating the diff.
Require a short contract or intent block on PRs that add/reshape interfaces.
Make outcome checks (tests, scenarios) part of that contract.

Taste owners for key surfaces

Assign a small set of senior "taste owners" for core modules and UX surfaces.
Route any interface change in their area to them or their lane.
Rotate juniors through pairing with them on representative PRs.

Periodic architecture/taste bundles

Batch small, related agent PRs by area into a periodic "drift bundle" for holistic review.
Use bundles to rename, re-factor, and re-align APIs, not just fix bugs.

Capability-tiered agent access

Allow agents to freely edit implementations; restrict interface creation/changes to higher-trust workflows (explicit approvals, richer prompts, more tests).
Keep some core boundaries manual-first for teaching and taste.

Style and interface exemplars

Maintain a small set of exemplar files for each area that show "how we want this to look".
Point agents and humans at exemplars in prompts and review comments.

Measuring drift in day-to-day PRs

Simple PR-level drift signals

Interface churn: rate of new/changed public methods, endpoints, events per area.
Name and shape inconsistency: new APIs that diverge from local exemplars (measured via light linting/heuristics or agent checks).
Test/contract alignment: changes to interfaces without matching updates to tests/docs/contracts.

Taste review mix

Ratio of PRs where reviewers leave interface/taste comments vs only correctness comments.
Share of interface PRs seen by a taste owner vs general fast-lane review.

Periodic sampling

Randomly sample a few recent PRs per area; have a senior rate them on 1–2 simple dimensions: "fits local style/taste?" and "interface clearer or messier than before?".
Track scores over time; trigger drift bundles or rule tweaks when scores fall.

Together, these keep interface decisions human-led, make agent defaults visible in metrics, and provide concrete points to intervene before taste regresses.