If we treat the current focus on stronger harnesses and tighter diff-first review as only one possible response to implementation abundance, what alternative organizing principle—such as contract-centric product engines, outcome-verified features, or UX-owned interface charters—might better govern the ambition frontier and review bottleneck in agent-first teams, and what failure modes would that alternative expose that the harness-centric frame currently hides?

dhh-agent-first-software-craft | Updated at

Answer

A useful alternative is an outcome-verified contract engine: organize work around small, named behavior contracts with explicit, measurable outcomes, and let those contracts (not harness rules) govern ambition and review.

  1. Core idea
  • Every meaningful behavior ships as a contract: clear pre/post-conditions, UX intent, and 1–3 concrete checks (metrics, logs, scenarios).
  • Agents are free inside the contract; humans gate which contracts exist, change, or compose, and what counts as success.
  • Review bottleneck moves from “is this diff ok?” to “is this contract worth existing / changing, and are its checks good enough?”
  1. How it governs ambition frontier
  • You raise ambition by adding or widening contracts (new flows, stricter outcomes, cross-system behaviors), not by loosening harness rules.
  • Cheap experiments are reversible hunch probes that create temporary contracts with narrow scope and auto-expiry.
  • Opinionated stacks and monoliths help by making contracts small and token-cheap (Rails actions, jobs, services).
  1. How it reshapes the review bottleneck
  • Primary review questions:
    • Is this contract coherent with product strategy and UX taste?
    • Are its checks strong enough that we trust agent-chosen implementations?
    • Does it cross risky domains (money, auth, migrations, vendor edges) that require stricter verification lanes?
  • Code review becomes lighter where contracts + checks are strong; heavier where they are weak or absent.
  1. What this exposes that harness-centrism hides
  • Outcome under-specification: many teams rely on style/tests but have fuzzy “what must never break” definitions. A contract engine forces this to the surface.
  • Misaligned success metrics: features that “look good in diff” but don’t move behavior or UX outcomes become obvious.
  • Review fatigue on outcomes: humans may be good at line-level review but weak at repeatedly judging “are these checks really enough?”
  • Contract sprawl: without pruning, you get hundreds of small contracts with stale checks, even if harness rules are clean.
  1. New failure modes vs harness-centric framing
  • Metric/UX gaming: teams overfit to what’s easy to check (latency, click-through) and under-specify subtle correctness, domain rules, or aesthetic fit.
  • Invisible coupling between contracts: many local contracts interact (e.g., discount rules + billing + analytics); agents change them independently and create emergent bugs.
  • Stale or weak checks: contracts look solid on paper but their tests/monitors rot; agents keep shipping “verified” changes that only satisfy outdated checks.
  • Outcome-review overload: seniors become the bottleneck on deciding/curating contracts and checks rather than on code, trading diff fatigue for product/metrics fatigue.
  1. How to encode this lightly in current practice
  • Add a tiny contract.md (or Rails-style contract: block) near key endpoints/jobs:
    • Purpose, inputs/outputs.
    • 1–3 scenarios or metrics that must hold.
    • Risk tags (money/auth/data-migration/etc.).
  • Harness reads these to:
    • Route diffs into lanes (strict vs light review) using risk tags.
    • Auto-run scenario scripts or checks.
    • Summarize contract impact in PRs for reviewers.