If we assume implementation abundance but deliberately remove most existing harness constraints—no boundary tags, few risk lanes, minimal verification scripts—and instead make humans responsible only for post-hoc diff review and incident handling, what breaks first in real teams (architecture, safety, craft, or pace), and how would that breakdown challenge the current belief that stronger harnesses are the primary lever for preserving judgment and taste in agent-first workflows?

dhh-agent-first-software-craft | Updated at

Answer

Most teams that drop harness constraints but keep agents and rely only on diff review will see safety and craft degrade first, with architecture drifting next and pace eventually stalling. This pattern weakens the idea that harness strength alone protects judgment and taste; it suggests the main levers are where the review bottleneck sits and how much context and time reviewers actually have.

Compact view:

  • What breaks first:

    1. Safety: more subtle regressions, incident load, and fear of touching areas.
    2. Craft: inconsistent patterns, noisier diffs, shallow tests, copy-paste fixes.
    3. Architecture: slow boundary erosion that becomes visible through incidents and brittle tests.
    4. Pace: net speed drops as incident handling and rework eat into any generation gains.
  • Why safety and craft go first:

    • Agents keep shipping plausible code; without harness checks, reviewers see large, mixed-quality diffs.
    • Humans under time pressure skim and rely on green tests; gaps in tests and verification mean more production defects.
    • Taste is hard to exercise over big, dense diffs; reviewers default to “good enough.”
  • How this challenges “stronger harnesses are the main lever”:

    • The core constraint becomes human review capacity and attention, not just whether the harness is strict.
    • Even strong harnesses fail if reviewers are overloaded or context-poor; conversely, small, high-context teams with weak harnesses but very selective merging can preserve taste better than expected.
    • The most important design choice is where to place constraints relative to the review bottleneck:
      • At task intake (what work is agent-eligible).
      • At diff granularity (enforcing small, scoped changes).
      • At human merge gates (who can approve which areas, with what tests).
  • Implication for agent-first workflows:

    • Harness rules help, but they are supporting tools for human judgment, not substitutes.
    • To preserve taste and safety under implementation abundance, teams should:
      • Keep diffs small and boundary-scoped.
      • Route high-risk changes to high-context reviewers.
      • Use harness checks mainly to keep reviewable units clean, not to be the only guardians of craft.