If we treat the current focus on stronger harnesses and tighter diff-first review as only one possible response to implementation abundance, what alternative organizing principle—such as contract-centric product engines, outcome-verified features, or UX-owned interface charters—might better govern the ambition frontier and review bottleneck in agent-first teams, and what failure modes would that alternative expose that the harness-centric frame currently hides?
dhh-agent-first-software-craft | Updated at
Answer
A useful alternative is an outcome-verified contract engine: organize work around small, named behavior contracts with explicit, measurable outcomes, and let those contracts (not harness rules) govern ambition and review.
- Core idea
- Every meaningful behavior ships as a contract: clear pre/post-conditions, UX intent, and 1–3 concrete checks (metrics, logs, scenarios).
- Agents are free inside the contract; humans gate which contracts exist, change, or compose, and what counts as success.
- Review bottleneck moves from “is this diff ok?” to “is this contract worth existing / changing, and are its checks good enough?”
- How it governs ambition frontier
- You raise ambition by adding or widening contracts (new flows, stricter outcomes, cross-system behaviors), not by loosening harness rules.
- Cheap experiments are reversible hunch probes that create temporary contracts with narrow scope and auto-expiry.
- Opinionated stacks and monoliths help by making contracts small and token-cheap (Rails actions, jobs, services).
- How it reshapes the review bottleneck
- Primary review questions:
- Is this contract coherent with product strategy and UX taste?
- Are its checks strong enough that we trust agent-chosen implementations?
- Does it cross risky domains (money, auth, migrations, vendor edges) that require stricter verification lanes?
- Code review becomes lighter where contracts + checks are strong; heavier where they are weak or absent.
- What this exposes that harness-centrism hides
- Outcome under-specification: many teams rely on style/tests but have fuzzy “what must never break” definitions. A contract engine forces this to the surface.
- Misaligned success metrics: features that “look good in diff” but don’t move behavior or UX outcomes become obvious.
- Review fatigue on outcomes: humans may be good at line-level review but weak at repeatedly judging “are these checks really enough?”
- Contract sprawl: without pruning, you get hundreds of small contracts with stale checks, even if harness rules are clean.
- New failure modes vs harness-centric framing
- Metric/UX gaming: teams overfit to what’s easy to check (latency, click-through) and under-specify subtle correctness, domain rules, or aesthetic fit.
- Invisible coupling between contracts: many local contracts interact (e.g., discount rules + billing + analytics); agents change them independently and create emergent bugs.
- Stale or weak checks: contracts look solid on paper but their tests/monitors rot; agents keep shipping “verified” changes that only satisfy outdated checks.
- Outcome-review overload: seniors become the bottleneck on deciding/curating contracts and checks rather than on code, trading diff fatigue for product/metrics fatigue.
- How to encode this lightly in current practice
- Add a tiny
contract.md(or Rails-stylecontract:block) near key endpoints/jobs:- Purpose, inputs/outputs.
- 1–3 scenarios or metrics that must hold.
- Risk tags (money/auth/data-migration/etc.).
- Harness reads these to:
- Route diffs into lanes (strict vs light review) using risk tags.
- Auto-run scenario scripts or checks.
- Summarize contract impact in PRs for reviewers.
—