In small, implementation-abundant monolith teams that already use boundary tags, drift metrics, and checklist-based review, what concrete failure modes still slip through (e.g., performance cliffs, observability blind spots, “just one more flag” complexity), and how could those be encoded as additional harness rules or PR lanes without turning the craft bar into an unmanageable rule maze?
dhh-agent-first-software-craft | Updated at
Answer
Key residual failure modes and compact mitigations.
- Performance cliffs and hidden hot paths
- Failure modes:
- Innocent-looking queries or N+1s on newly-hot endpoints.
- Extra network hops or cache misuse in critical flows.
- Agents adding “simple” abstractions that add allocations/indirection.
- Harness/PR encoding:
- Perf lane: PR label
perf_sensitiverequired for tagged hot paths; triggers:- small LOC cap,
- required before/after timing or query-plan snapshot.
- Guardrails:
- Block new cross-boundary calls from hot paths without a short
PERF-JUST:note. - Simple static checks: new queries in hot dirs must use existing scopes/loaders.
- Block new cross-boundary calls from hot paths without a short
- Perf lane: PR label
- Observability blind spots
- Failure modes:
- New flows with no logs/metrics.
- Existing alerts silently invalidated by renamed fields or tags.
- Harness/PR encoding:
- Obs rule: when touching controller/job/worker in tagged areas, require either:
- log/metric change in same diff, or
OBS-NOCHANGE:justification.
- Diff check: if an alert/query references fields/labels being changed, require reviewer ack
OBS-ACKcomment.
- Obs rule: when touching controller/job/worker in tagged areas, require either:
- “Just one more flag” and config sprawl
- Failure modes:
- Conditionals proliferate around feature flags / env vars.
- Flag lifecycle (rollout → cleanup) never completed.
- Harness/PR encoding:
- Flag lane: new flag requires:
- owner,
- expiry or cleanup condition,
- single source of truth file touched.
- Rule: if adding a branch to code with existing 2+ flag checks, fail with suggestion to extract a strategy object or retire old flags first.
- Flag lane: new flag requires:
- Accidental cross-cutting concerns
- Failure modes:
- Observability, auth, and caching added ad hoc in multiple layers.
- Subtle policy differences between paths that should match.
- Harness/PR encoding:
- Tagged cross-cut dirs (
auth,billing,permissions):- block new inline checks in controllers/services; require use of central helper.
- Drift metric already present: extend to enforce
NO_INLINE_AUTH/NO_INLINE_CACHEin tagged areas.
- Tagged cross-cut dirs (
- Test-quality holes (not just coverage)
- Failure modes:
- Tests that assert on incidental details; agents reinforce brittle patterns.
- New behaviors only covered via slow end-to-end tests.
- Harness/PR encoding:
- Lane
fast_path_change: for tagged flows, require at least one focused unit/contract test touching new branch. - Simple heuristic: if only snapshot/E2E tests changed for non-UI code, warn and require reviewer comment
TEST-OK.
- Lane
- UX and operability rough edges
- Failure modes:
- CLI/API surfaces grow inconsistent names/flags.
- Operational scripts gain unsafe defaults.
- Harness/PR encoding:
- For CLI/MCP/ops scripts:
- naming rule: new commands must match existing verb/object patterns; simple regex/allowlist.
- safety rule: destructive ops require explicit
--yesand dry-run path; harness rejects otherwise.
- For CLI/MCP/ops scripts:
Keeping rules from becoming a maze
- Group by 3–4 lanes, not dozens of micro-rules:
perf_sensitiveobs_sensitiveflag_changecore_policy_change
- Each lane:
- 1–3 enforced checks,
- 1–2 allowed escape hatches via short justification tags.
- Encode rules at boundary/tag level, not arbitrary dirs.
- Review checklist shrinks to: “Did we pick the right lane? Any lane missing?” rather than re-verifying every rule by hand.