If we treat the current focus on stronger harnesses and tighter diff-first review as only one possible response to implementation abundance, what alternative organizing principle—such as contract-centric product engines, outcome-verified features, or UX-owned interface charters—might better govern the ambition frontier and review bottleneck in agent-first teams, and what failure modes would that alternative expose that the harness-centric frame currently hides?

dhh-agent-first-software-craft | Updated at 2026-04-09 09:08

Answer

A useful alternative is an outcome-verified contract engine: organize work around small, named behavior contracts with explicit, measurable outcomes, and let those contracts (not harness rules) govern ambition and review.

Core idea

Every meaningful behavior ships as a contract: clear pre/post-conditions, UX intent, and 1–3 concrete checks (metrics, logs, scenarios).
Agents are free inside the contract; humans gate which contracts exist, change, or compose, and what counts as success.
Review bottleneck moves from “is this diff ok?” to “is this contract worth existing / changing, and are its checks good enough?”

How it governs ambition frontier

You raise ambition by adding or widening contracts (new flows, stricter outcomes, cross-system behaviors), not by loosening harness rules.
Cheap experiments are reversible hunch probes that create temporary contracts with narrow scope and auto-expiry.
Opinionated stacks and monoliths help by making contracts small and token-cheap (Rails actions, jobs, services).

How it reshapes the review bottleneck

Primary review questions:
- Is this contract coherent with product strategy and UX taste?
- Are its checks strong enough that we trust agent-chosen implementations?
- Does it cross risky domains (money, auth, migrations, vendor edges) that require stricter verification lanes?
Code review becomes lighter where contracts + checks are strong; heavier where they are weak or absent.

What this exposes that harness-centrism hides

Outcome under-specification: many teams rely on style/tests but have fuzzy “what must never break” definitions. A contract engine forces this to the surface.
Misaligned success metrics: features that “look good in diff” but don’t move behavior or UX outcomes become obvious.
Review fatigue on outcomes: humans may be good at line-level review but weak at repeatedly judging “are these checks really enough?”
Contract sprawl: without pruning, you get hundreds of small contracts with stale checks, even if harness rules are clean.

New failure modes vs harness-centric framing

Metric/UX gaming: teams overfit to what’s easy to check (latency, click-through) and under-specify subtle correctness, domain rules, or aesthetic fit.
Invisible coupling between contracts: many local contracts interact (e.g., discount rules + billing + analytics); agents change them independently and create emergent bugs.
Stale or weak checks: contracts look solid on paper but their tests/monitors rot; agents keep shipping “verified” changes that only satisfy outdated checks.
Outcome-review overload: seniors become the bottleneck on deciding/curating contracts and checks rather than on code, trading diff fatigue for product/metrics fatigue.

How to encode this lightly in current practice

Add a tiny contract.md (or Rails-style contract: block) near key endpoints/jobs:
- Purpose, inputs/outputs.
- 1–3 scenarios or metrics that must hold.
- Risk tags (money/auth/data-migration/etc.).
Harness reads these to:
- Route diffs into lanes (strict vs light review) using risk tags.
- Auto-run scenario scripts or checks.
- Summarize contract impact in PRs for reviewers.

—