For teams that treat reversible hunch probes and designer-owned CLI flows as first-class lanes, which concrete probe outcomes and harness signals (e.g., repeated “adopt with heavy rewrite,” frequent expired-but-resurrected probes, or probe tools quietly becoming de facto production paths) should trigger a structured “probe hardening” phase, and how should that phase be encoded in commands, tags, and verification scripts so ambition expansion continues while the verification layer and review bottleneck stay legible?

dhh-agent-first-software-craft | Updated at 2026-04-09 09:21

Answer

Trigger hardening on repeat patterns and encode it as an explicit lane.

Probe outcomes / signals that should flip to hardening

High-rewrite adoption
- Pattern: many probes end as “adopt with heavy rewrite.”
- Meaning: the idea is sound but the lane is leaking messy code.
- Trigger rule: ≥3 probes on same flow/area in 2–4 weeks with “adopt + heavy rewrite” notes.
Expired but resurrected probes
- Pattern: probes hit TTL, are “temporary,” then get revived or copied.
- Meaning: the flow has real value and wants to be productized.
- Trigger rule: any probe touched after expiry or cloned into a new probe branch.
Probe CLI tools becoming de facto paths
- Pattern: tier:experimental commands show frequent use in logs, or are invoked in incidents or by other teams.
- Meaning: the probe is being relied on as real automation.
- Trigger rule: usage / week over threshold (e.g., >N runs or any on-call use).
Repeated manual safety patches
- Pattern: reviewers keep adding the same checks (auth, invariants, logging) to probe-driven diffs.
- Trigger rule: same class of fix requested ≥3 times on related probes.
Cross-boundary or money/data touching probes
- Pattern: probe flows cross stable boundaries or touch money/data paths.
- Trigger rule: any probe diff tagged risk:money, risk:data, or that hits guarded directories.

When any trigger fires, the harness should open a “probe hardening” lane instead of another casual probe.

Encoding the hardening phase in commands and tags

New lane + tags
- Tag: lane:probe_hardening.
- Extra tags: probe:source=<id>, risk:<class>, owner:<handle>.
- Status tags: hardening:design, hardening:impl, hardening:verify, hardening:done.
CLI commands
- probe:run (existing): --ttl, --intent, --owner.
- probe:hardening:init --from=<probe_id>
  - Clones probe manifest.
  - Creates hardening branch.
  - Adds tags lane:probe_hardening, probe:source.
- probe:hardening:plan
  - Agent drafts short plan: scope, boundaries, checks, exit criteria.
- probe:hardening:apply
  - Runs targeted changes under plan.
- probe:hardening:verify
  - Runs and reports all required checks.

Encoding hardening in manifests and verification scripts

Manifest fields (per hardening lane)
- source_probe_id
- intent (1–2 lines)
- blast_radius (e.g., local_flow, cross_boundary, money/data)
- required_checks (list of scripts/tests)
- promotion_target (cli_tool, internal_api, ux_flow, runbook_step)
Verification scripts
- verify:probes:usage
  - Fails if tier:experimental command over usage threshold and not in hardening.
- verify:probes:expiry
  - Fails on branches where expired probes are changed without a hardening lane.
- verify:probes:safety
  - Checks: auth, logging, invariants, error handling present for high-blast-radius lanes.
- verify:probes:boundary
  - Flags cross-boundary calls; requires explicit allowlist in manifest.
PR gating rules
- Any diff tagged lane:probe_hardening must:
  - Reference source_probe_id.
  - Include hardening:plan summary.
  - Pass verify:probes:* scripts.
  - Get at least one senior or custodian review for blast_radius != local_flow.

Keeping ambition high and review legible

Make triggers automatic, not taste-based
- Usage metrics, rewrite counts, expired-edit detection, and risk tags should auto-suggest hardening.
Keep hardening small and timeboxed
- Default TTL for hardening lanes (e.g., one sprint) with hardening:done or rollback decision.
Separate low-risk codification from high-risk
- blast_radius=local_flow: scripts + normal reviewer.
- blast_radius=cross_boundary|money/data: scripts + custodian + maybe extra tests.
Visible dashboards, simple questions
- Dashboard: active_probes, active_hardening, experimental_tools_over_threshold.
- In review UI, show: "Is this still a probe, or is this the moment we harden?" when thresholds are close, to nudge explicit calls.

This keeps reversible hunch probes cheap, but forces repeated value or risk patterns into a clear “probe hardening” lane with tags, commands, and verifiers that the review bottleneck can see and reason about quickly.