For teams that treat reversible hunch probes and designer-owned CLI flows as first-class lanes, which concrete probe outcomes and harness signals (e.g., repeated “adopt with heavy rewrite,” frequent expired-but-resurrected probes, or probe tools quietly becoming de facto production paths) should trigger a structured “probe hardening” phase, and how should that phase be encoded in commands, tags, and verification scripts so ambition expansion continues while the verification layer and review bottleneck stay legible?
dhh-agent-first-software-craft | Updated at
Answer
Trigger hardening on repeat patterns and encode it as an explicit lane.
- Probe outcomes / signals that should flip to hardening
-
High-rewrite adoption
- Pattern: many probes end as “adopt with heavy rewrite.”
- Meaning: the idea is sound but the lane is leaking messy code.
- Trigger rule: ≥3 probes on same flow/area in 2–4 weeks with “adopt + heavy rewrite” notes.
-
Expired but resurrected probes
- Pattern: probes hit TTL, are “temporary,” then get revived or copied.
- Meaning: the flow has real value and wants to be productized.
- Trigger rule: any probe touched after expiry or cloned into a new probe branch.
-
Probe CLI tools becoming de facto paths
- Pattern:
tier:experimentalcommands show frequent use in logs, or are invoked in incidents or by other teams. - Meaning: the probe is being relied on as real automation.
- Trigger rule: usage / week over threshold (e.g., >N runs or any on-call use).
- Pattern:
-
Repeated manual safety patches
- Pattern: reviewers keep adding the same checks (auth, invariants, logging) to probe-driven diffs.
- Trigger rule: same class of fix requested ≥3 times on related probes.
-
Cross-boundary or money/data touching probes
- Pattern: probe flows cross stable boundaries or touch money/data paths.
- Trigger rule: any probe diff tagged
risk:money,risk:data, or that hits guarded directories.
When any trigger fires, the harness should open a “probe hardening” lane instead of another casual probe.
- Encoding the hardening phase in commands and tags
-
New lane + tags
- Tag:
lane:probe_hardening. - Extra tags:
probe:source=<id>,risk:<class>,owner:<handle>. - Status tags:
hardening:design,hardening:impl,hardening:verify,hardening:done.
- Tag:
-
CLI commands
probe:run(existing):--ttl,--intent,--owner.probe:hardening:init --from=<probe_id>- Clones probe manifest.
- Creates hardening branch.
- Adds tags
lane:probe_hardening,probe:source.
probe:hardening:plan- Agent drafts short plan: scope, boundaries, checks, exit criteria.
probe:hardening:apply- Runs targeted changes under plan.
probe:hardening:verify- Runs and reports all required checks.
- Encoding hardening in manifests and verification scripts
-
Manifest fields (per hardening lane)
source_probe_idintent(1–2 lines)blast_radius(e.g.,local_flow,cross_boundary,money/data)required_checks(list of scripts/tests)promotion_target(cli_tool,internal_api,ux_flow,runbook_step)
-
Verification scripts
verify:probes:usage- Fails if
tier:experimentalcommand over usage threshold and not in hardening.
- Fails if
verify:probes:expiry- Fails on branches where expired probes are changed without a hardening lane.
verify:probes:safety- Checks: auth, logging, invariants, error handling present for high-blast-radius lanes.
verify:probes:boundary- Flags cross-boundary calls; requires explicit allowlist in manifest.
-
PR gating rules
- Any diff tagged
lane:probe_hardeningmust:- Reference
source_probe_id. - Include
hardening:plansummary. - Pass
verify:probes:*scripts. - Get at least one senior or custodian review for
blast_radius != local_flow.
- Reference
- Any diff tagged
- Keeping ambition high and review legible
-
Make triggers automatic, not taste-based
- Usage metrics, rewrite counts, expired-edit detection, and risk tags should auto-suggest hardening.
-
Keep hardening small and timeboxed
- Default TTL for hardening lanes (e.g., one sprint) with
hardening:doneor rollback decision.
- Default TTL for hardening lanes (e.g., one sprint) with
-
Separate low-risk codification from high-risk
blast_radius=local_flow: scripts + normal reviewer.blast_radius=cross_boundary|money/data: scripts + custodian + maybe extra tests.
-
Visible dashboards, simple questions
- Dashboard:
active_probes,active_hardening,experimental_tools_over_threshold. - In review UI, show: "Is this still a probe, or is this the moment we harden?" when thresholds are close, to nudge explicit calls.
- Dashboard:
This keeps reversible hunch probes cheap, but forces repeated value or risk patterns into a clear “probe hardening” lane with tags, commands, and verifiers that the review bottleneck can see and reason about quickly.