Most current patterns assume that opinionated, token-efficient monolith stacks are the natural home for agent-first workflows; under what conditions would a more Unix-like, CLI-substrate–heavy, service-y architecture actually reduce review bottlenecks and apprenticeship decay—by making interfaces more human- and agent-verifiable—and what concrete redesigns of context bridges and harness flows would be required to test this inversion in a real team?
dhh-agent-first-software-craft | Updated at
Answer
Conditions and a testable redesign where a Unix-y, CLI-heavy, service-y setup can beat monoliths on review and apprenticeship:
- Conditions where Unix-y/service-y helps
-
Services are thin, stable, text-first interfaces
- Each service or CLI exposes a small, well-documented contract (flags, JSON I/O, exit codes).
- Most behavior is scriptable via the CLI substrate, not bespoke HTTP/SDKs.
-
Context per change is smaller than in the monolith
- A typical PR touches one service/CLI plus glue, not 8 modules across the monolith.
- Ownership and on-call map clearly to services/CLIs.
-
Harness can drive CLIs reliably
- Agent harness runs commands, captures stdout/stderr, and diffs outputs.
- Common flows become small shell/test recipes agents can rerun and extend.
-
Interfaces are more stable than internals
- Internal churn is high; interfaces stay narrow and versioned.
- Humans and agents reason mostly in terms of commands and contracts.
-
Team already thinks in pipelines
- People design systems as composable steps ("ingest → normalize → score → notify").
- Unix-style composition feels natural, not bolted on.
- How this can reduce review bottlenecks
-
Review focuses on contracts, not all internals
- Seniors review: new commands, flags, schemas, SLOs, and a few golden-path scripts.
- Implementation inside a service is mostly local and lower review load.
-
Diff-first, spec-adjacent review
- Harness attaches for each PR:
- CLI/API contract diff (new/changed commands, JSON shapes, error codes).
- Before/after sample runs (inputs/outputs) for key commands.
- Reviewers skim contracts and examples instead of large code diffs.
- Harness attaches for each PR:
-
Lane-based risk routing
- Lanes by interface impact:
local_impl: internal-only change; light review.cli_contract_change: senior review + scenario runs.cross_service_pipe: extra checks on end-to-end scripts.
- This keeps the review bottleneck on the few PRs that change interfaces.
- Lanes by interface impact:
- How this can reduce apprenticeship decay
-
Juniors learn via commands and scripts
- Onboarding focuses on: "How do I run X flow end-to-end? What does each step guarantee?"
- They can read and edit pipeline scripts without deep framework expertise.
-
Agents expose more of the reasoning surface
- Harness cards show: which commands were run, with what flags, and how outputs changed.
- Juniors see concrete sequences instead of opaque agent magic inside a big monolith.
-
Easier, safer sandboxes
- CLI flows are easy to run in local/dev with fixture data.
- Juniors experiment by composing existing commands, then promote successful scripts.
- Required context-bridge redesign
-
Interface-first context bridges
- For each service/CLI, the bridge exposes:
- Command list, usage, examples.
- Input/output schemas or JSON samples.
- SLOs and side-effects.
- Agents default to reasoning at the command/contract level.
- For each service/CLI, the bridge exposes:
-
Contract-aware diff views
- Context bridge computes for a PR:
- Added/removed/changed commands.
- Changed flags and JSON fields.
- Updated scripts/workflows that use them.
- Harness surfaces this as a 1–2 page "contract diff" for humans and agents.
- Context bridge computes for a PR:
-
Pipeline maps as first-class context
- Maintain a simple map: which commands/services form each named flow.
- Bridge gives agents this map and marks which steps are hot paths or high risk.
- Required harness-flow redesign
-
Sidecar agent loop over CLIs
- Agents work in a sidecar terminal:
- Propose/modify code.
- Run CLI commands.
- Capture stdout/JSON and summarize.
- All steps logged as a short, replayable script.
- Agents work in a sidecar terminal:
-
Script- and contract-centric PR artifacts
- For each PR, harness auto-generates:
script_card.md: key commands and example invocations affected.contract_card.md: CLI/API shape changes, with before/after samples.risk_lane: inferred lane (local_impl,cli_contract_change, etc.).
- For each PR, harness auto-generates:
-
Simple verification recipes per lane
local_impl: run unit tests for the service; quick smoke CLI.cli_contract_change: run a small scenario suite (3–5 canned inputs) and verify JSON diffs.cross_service_pipe: run end-to-end pipeline script with test data; check invariants.
- How to test this inversion on a real team
-
Choose a contained slice
- Pick 1–2 flows inside an existing monolith (e.g., reporting or ETL) and peel them into:
- 2–4 small services with clear CLIs.
- 1–2 pipeline scripts.
- Pick 1–2 flows inside an existing monolith (e.g., reporting or ETL) and peel them into:
-
Run an A/B on review and learning
- For 6–8 weeks, compare:
- Monolith changes vs CLI/service changes on:
- Review time per PR.
- Number of review comments on behavior/contract vs style.
- Defects per 100 PRs for that flow.
- Apprentice learning:
- Time for a junior to complete a small change solo.
- Their ability to explain the flow after.
- Monolith changes vs CLI/service changes on:
- For 6–8 weeks, compare:
-
Keep agent capabilities symmetric
- Same models, same reviewers.
- Only change: monolith vs CLI/service architecture + harness/bridge tweaks.
- When this likely fails
- Services expose wide, bespoke APIs instead of narrow CLIs.
- There is no shared pipeline map; flows are ad-hoc cross-service calls.
- Harness cannot reliably run commands in realistic environments.
- Team lacks capacity to keep contracts and docs accurate.
Under the right conditions—narrow, stable CLIs; good pipeline maps; contract-first context bridges; script- and contract-centric harness flows—a Unix-like, service-y stack can tighten review focus and make agent behavior and apprenticeship more transparent than a large monolith.