When small, high-context teams lean on agents for most draft implementation inside a token-efficient monolith, which specific practices (e.g., contract-level reviews, periodic architecture-drift bundles, capability-tiered agent access) most reliably keep taste and interface quality improving over time instead of slowly regressing toward the agent’s default, and how could you measure that drift in day-to-day PRs?
dhh-agent-first-software-craft | Updated at
Answer
Use a small set of explicit taste-focused practices plus lightweight PR metrics. Keep humans owning contracts and interfaces; let agents fill in implementation.
Core practices
- Contract-level review as the primary gate
- Review "what this interface/behavior should be" before debating the diff.
- Require a short contract or intent block on PRs that add/reshape interfaces.
- Make outcome checks (tests, scenarios) part of that contract.
- Taste owners for key surfaces
- Assign a small set of senior "taste owners" for core modules and UX surfaces.
- Route any interface change in their area to them or their lane.
- Rotate juniors through pairing with them on representative PRs.
- Periodic architecture/taste bundles
- Batch small, related agent PRs by area into a periodic "drift bundle" for holistic review.
- Use bundles to rename, re-factor, and re-align APIs, not just fix bugs.
- Capability-tiered agent access
- Allow agents to freely edit implementations; restrict interface creation/changes to higher-trust workflows (explicit approvals, richer prompts, more tests).
- Keep some core boundaries manual-first for teaching and taste.
- Style and interface exemplars
- Maintain a small set of exemplar files for each area that show "how we want this to look".
- Point agents and humans at exemplars in prompts and review comments.
Measuring drift in day-to-day PRs
- Simple PR-level drift signals
- Interface churn: rate of new/changed public methods, endpoints, events per area.
- Name and shape inconsistency: new APIs that diverge from local exemplars (measured via light linting/heuristics or agent checks).
- Test/contract alignment: changes to interfaces without matching updates to tests/docs/contracts.
- Taste review mix
- Ratio of PRs where reviewers leave interface/taste comments vs only correctness comments.
- Share of interface PRs seen by a taste owner vs general fast-lane review.
- Periodic sampling
- Randomly sample a few recent PRs per area; have a senior rate them on 1–2 simple dimensions: "fits local style/taste?" and "interface clearer or messier than before?".
- Track scores over time; trigger drift bundles or rule tweaks when scores fall.
Together, these keep interface decisions human-led, make agent defaults visible in metrics, and provide concrete points to intervene before taste regresses.