In small, implementation-abundant monolith teams that already use boundary tags, drift metrics, and checklist-based review, what concrete failure modes still slip through (e.g., performance cliffs, observability blind spots, “just one more flag” complexity), and how could those be encoded as additional harness rules or PR lanes without turning the craft bar into an unmanageable rule maze?

Answer

Key residual failure modes and compact mitigations.

Failure modes:
- Innocent-looking queries or N+1s on newly-hot endpoints.
- Extra network hops or cache misuse in critical flows.
- Agents adding “simple” abstractions that add allocations/indirection.
Harness/PR encoding:
- Perf lane: PR label perf_sensitive required for tagged hot paths; triggers:
  - small LOC cap,
  - required before/after timing or query-plan snapshot.
- Guardrails:
  - Block new cross-boundary calls from hot paths without a short PERF-JUST: note.
  - Simple static checks: new queries in hot dirs must use existing scopes/loaders.

Failure modes:
- New flows with no logs/metrics.
- Existing alerts silently invalidated by renamed fields or tags.
Harness/PR encoding:
- Obs rule: when touching controller/job/worker in tagged areas, require either:
  - log/metric change in same diff, or
  - OBS-NOCHANGE: justification.
- Diff check: if an alert/query references fields/labels being changed, require reviewer ack OBS-ACK comment.

Failure modes:
- Conditionals proliferate around feature flags / env vars.
- Flag lifecycle (rollout → cleanup) never completed.
Harness/PR encoding:
- Flag lane: new flag requires:
  - owner,
  - expiry or cleanup condition,
  - single source of truth file touched.
- Rule: if adding a branch to code with existing 2+ flag checks, fail with suggestion to extract a strategy object or retire old flags first.

Failure modes:
- Observability, auth, and caching added ad hoc in multiple layers.
- Subtle policy differences between paths that should match.
Harness/PR encoding:
- Tagged cross-cut dirs (auth, billing, permissions):
  - block new inline checks in controllers/services; require use of central helper.
- Drift metric already present: extend to enforce NO_INLINE_AUTH / NO_INLINE_CACHE in tagged areas.

Failure modes:
- Tests that assert on incidental details; agents reinforce brittle patterns.
- New behaviors only covered via slow end-to-end tests.
Harness/PR encoding:
- Lane fast_path_change: for tagged flows, require at least one focused unit/contract test touching new branch.
- Simple heuristic: if only snapshot/E2E tests changed for non-UI code, warn and require reviewer comment TEST-OK.

Failure modes:
- CLI/API surfaces grow inconsistent names/flags.
- Operational scripts gain unsafe defaults.
Harness/PR encoding:
- For CLI/MCP/ops scripts:
  - naming rule: new commands must match existing verb/object patterns; simple regex/allowlist.
  - safety rule: destructive ops require explicit --yes and dry-run path; harness rejects otherwise.

Keeping rules from becoming a maze

Group by 3–4 lanes, not dozens of micro-rules:
1. perf_sensitive
2. obs_sensitive
3. flag_change
4. core_policy_change
Each lane:
- 1–3 enforced checks,
- 1–2 allowed escape hatches via short justification tags.
Encode rules at boundary/tag level, not arbitrary dirs.
Review checklist shrinks to: “Did we pick the right lane? Any lane missing?” rather than re-verifying every rule by hand.