For teams that have adopted implementation abundance and a craft-maturity ladder, what concrete changes in staffing and role design (e.g., fewer generalist seniors, more “systems crafters” paired with multiple operators, or designer-owned harness flows with engineer custodians) most reliably expand the ambition frontier without increasing incident rates, and how could we instrument PRs, incidents, and experiment volume to tell whether a given configuration is actually working?
dhh-agent-first-software-craft | Updated at
Answer
- Likely-effective staffing patterns
-
Pattern A: Systems crafters + operators
- 1 senior "system crafter" (Tier 3) owns boundaries, harness rules, and high-risk changes.
- 2–4 Tier 1–2 operators run agents, iterate on diffs, and handle low/med-risk work.
- Use for monolith / opinionated stacks and high feature volume.
-
Pattern B: Designer-owned harness, engineer custodians
- Designer/product shaper owns CLI/MCP flows, prompts, and UX experiments.
- Senior engineer custodian owns boundaries, safety gates, and performance.
- Use where UX/workflow experiments drive value.
-
Pattern C: Contract owner + service locals (for fragmented stacks)
- One senior owns cross-service contracts/flows; multiple team members handle local changes.
- Agents constrained by those contracts.
Common shifts vs. traditional staffing
- Fewer "full-stack generalist" seniors doing all steps themselves.
- More seniors concentrated on:
- boundary design and refactors
- harness/contract rules
- change plans for risky work
- More juniors/mids operating agents within those lanes.
- Guardrails to keep incidents flat
-
Keep review bottleneck with high-craft roles:
- All Class 2–3 changes (schema, money, data backfills, cross-system flows) require:
- system crafter / custodian review
- explicit test or contract updates
- All Class 2–3 changes (schema, money, data backfills, cross-system flows) require:
-
Use craft ladder for routing:
- Tier 1: local features, well-tested paths, no contracts/flags.
- Tier 2: can touch boundaries with template change plans.
- Tier 3: owns cross-boundary changes, harness flows, and irreversible tools.
- Instrumentation to see if a configuration works
PR-level
- Tag PRs by:
- author craft tier
- staffing pattern (e.g., "sys-crafter+ops", "designer-owned-harness")
- change risk class (0–3, from existing change-management model)
- Track per pattern and tier:
- incident-linked PR rate
- rework per PR (follow-up fixes within N days)
- cross-boundary edits per PR
Incident-level
- For each incident, log:
- which pattern/staffing owned the change
- author tier and reviewer tier
- whether harness, contract, or flow was changed
- Compute, per pattern:
- incidents / 100 PRs by risk class
- MTTR and blast radius
Ambition / experiment volume
- Define cheap, trackable proxies:
- count of new flows/features per month
- count of safe experiments (feature flags, A/Bs, non-destructive jobs)
- share of work that is "new capability" vs. "maintenance" in tickets
- Compare per pattern, normalized by headcount.
- Simple success checks per configuration
-
Pattern A (system crafter + operators) is working if:
- incidents per 100 PRs in risky classes stay flat or drop vs. before
- experiment count and cross-boundary improvements rise
- Tier 1–2 throughput rises, and Tier 3 PR count is modest but touches many boundaries.
-
Pattern B (designer-owned harness) is working if:
- more UX/flow experiments land per cycle
- harness-change-linked incidents stay rare
- engineers report less time writing glue and more time on boundaries/infra.
-
Pattern C (contract owner + locals) is working if:
- cross-service incident rate drops
- contract-change PRs are few but heavily reviewed
- agents mostly touch leaf services, not raw cross-service calls.
- How to start empirically
- Pick one pattern per team for a quarter.
- Add minimal tags to PR template (tier, pattern, risk class).
- Add one field to incident form (pattern + tier).
- Review stats monthly and adjust staffing (e.g., rebalance system crafters, move harness ownership) based on:
- ambition/experiment lift vs. baseline
- incidents per risk class vs. baseline.