For designer-implementers using a CLI substrate in an agent-first workflow, what concrete mismatches between expressed UX intent (flow specs, story tests) and resulting diffs most often force senior engineer intervention, and how can harness patterns (commands, templates, verification scripts) be adjusted so that more of this “taste translation” is automated without eroding the engineering-owned craft bar on architecture and safety?

dhh-agent-first-software-craft | Updated at

Answer

Common mismatches and harness tweaks:

  1. Frequent UX–diff mismatches needing seniors
  • Flow seams vs code seams

    • Spec has a clean user journey; diff scatters logic across handlers, jobs, and helpers.
    • Example: one “checkout” story becomes 3–4 loosely coupled callbacks.
  • State and invariants

    • Spec implies simple, linear states; diff encodes ad-hoc flags and mixed concerns.
    • Example: “draft → submitted → approved” becomes booleans sprinkled across models.
  • Error and edge behavior

    • Story tests say what the user should see; diff only handles happy path or generic 500s.
    • Example: specific validation copy or recovery paths missing or inconsistent.
  • Naming and domain language

    • Flow spec uses stable terms; diff introduces new or mixed names.
    • Example: “workspace” in UX becomes Account, Org, and Team in code.
  • Boundary fit

    • Intent is “one team owns this flow”; diff cuts across boundaries or skips façades.
    • Example: CLI tool wires agent to multiple services directly instead of going through the app boundary.
  • Tests as UX contracts

    • Story tests describe behavior; generated tests cover API shapes or one-off examples.
    • Example: a happy-path story ends up as a single controller test with no scenario coverage.
  1. Harness patterns to absorb more taste translation
  • Command patterns

    • Provide narrow verbs per flow type: ux_flow:new, ux_flow:extend, ux_flow:refine_copy, ux_flow:edge_cases instead of a generic “change this file.”
    • Each verb loads flow specs, domain glossary, boundary map, and existing examples before agent runs.
  • Templates

    • Flow briefs
      • Short YAML/markdown capturing: primary story, states, key names, error promises, owning boundary.
      • Commands require a brief; agents must echo it back and map code changes to each section.
    • Scenario bundles
      • For each flow, keep a small .json/.rb/.py bundle of canonical stories (happy, edge, failure).
      • Harness templates generate tests from these, not from raw prompts.
  • Verification scripts

    • Diff mappers
      • Script: given a brief + diff, list which stories, states, errors, and boundaries changed.
      • Fail/warn if new code paths don’t map cleanly to any story, or if a story lost coverage.
    • Name and glossary checks
      • Script: scan new/changed names; flag when they diverge from glossary or same-flow usage.
    • Boundary checks
      • Script: enforce that flows touch only declared boundaries or known façades; else route to arch_review lane.
    • Story test runners
      • For each flow, run a small, human-readable scenario harness (CLI script, feature test) and show outputs beside the original story text.
  1. Guarding architecture and safety
  • Lanes and caps

    • Non-arch flows
      • Allow designer-owned commands to run freely when they stay within a boundary and pass story + regression checks.
    • Arch-sensitive flows
      • Auto-escalate when scripts detect new cross-boundary calls, new long-lived states, or schema changes.
      • Require senior sign-off and possibly separate commands (arch_move, new_boundary_flow).
  • Explicit safety rails

    • Tool tiers
      • Expose only “safe” tools (copy tweaks, view structure, routing within boundary) to default UX commands.
      • Keep mutating data shape, auth, and cross-system glue in higher-tier tools gated by engineers.
    • Checklists in CLI
      • For escalated flows, show a very short checklist: invariants touched, boundaries touched, data migrations, rollback path.
  • Review surfaces

    • Summaries over raw diffs
      • CLI prints a one-page “UX contract vs implementation” summary per run: stories, states, boundaries, risky changes.
      • Seniors mostly review these summaries + a few focused files, not entire agent-generated diffs.

Net: encode more of UX–code mapping in commands, briefs, and verification scripts, and gate only boundary, state, and safety-sensitive changes through senior review. That automates much of taste translation while keeping the craft bar for architecture and safety firmly with engineers.