Most current patterns assume that opinionated, token-efficient monolith stacks are the natural home for agent-first workflows; under what conditions would a more Unix-like, CLI-substrate–heavy, service-y architecture actually reduce review bottlenecks and apprenticeship decay—by making interfaces more human- and agent-verifiable—and what concrete redesigns of context bridges and harness flows would be required to test this inversion in a real team?

dhh-agent-first-software-craft | Updated at 2026-04-09 08:48

Answer

Conditions and a testable redesign where a Unix-y, CLI-heavy, service-y setup can beat monoliths on review and apprenticeship:

Conditions where Unix-y/service-y helps

Services are thin, stable, text-first interfaces
- Each service or CLI exposes a small, well-documented contract (flags, JSON I/O, exit codes).
- Most behavior is scriptable via the CLI substrate, not bespoke HTTP/SDKs.
Context per change is smaller than in the monolith
- A typical PR touches one service/CLI plus glue, not 8 modules across the monolith.
- Ownership and on-call map clearly to services/CLIs.
Harness can drive CLIs reliably
- Agent harness runs commands, captures stdout/stderr, and diffs outputs.
- Common flows become small shell/test recipes agents can rerun and extend.
Interfaces are more stable than internals
- Internal churn is high; interfaces stay narrow and versioned.
- Humans and agents reason mostly in terms of commands and contracts.
Team already thinks in pipelines
- People design systems as composable steps ("ingest → normalize → score → notify").
- Unix-style composition feels natural, not bolted on.

How this can reduce review bottlenecks

Review focuses on contracts, not all internals
- Seniors review: new commands, flags, schemas, SLOs, and a few golden-path scripts.
- Implementation inside a service is mostly local and lower review load.
Diff-first, spec-adjacent review
- Harness attaches for each PR:
  - CLI/API contract diff (new/changed commands, JSON shapes, error codes).
  - Before/after sample runs (inputs/outputs) for key commands.
- Reviewers skim contracts and examples instead of large code diffs.
Lane-based risk routing
- Lanes by interface impact:
  - local_impl: internal-only change; light review.
  - cli_contract_change: senior review + scenario runs.
  - cross_service_pipe: extra checks on end-to-end scripts.
- This keeps the review bottleneck on the few PRs that change interfaces.

How this can reduce apprenticeship decay

Juniors learn via commands and scripts
- Onboarding focuses on: "How do I run X flow end-to-end? What does each step guarantee?"
- They can read and edit pipeline scripts without deep framework expertise.
Agents expose more of the reasoning surface
- Harness cards show: which commands were run, with what flags, and how outputs changed.
- Juniors see concrete sequences instead of opaque agent magic inside a big monolith.
Easier, safer sandboxes
- CLI flows are easy to run in local/dev with fixture data.
- Juniors experiment by composing existing commands, then promote successful scripts.

Required context-bridge redesign

Interface-first context bridges
- For each service/CLI, the bridge exposes:
  - Command list, usage, examples.
  - Input/output schemas or JSON samples.
  - SLOs and side-effects.
- Agents default to reasoning at the command/contract level.
Contract-aware diff views
- Context bridge computes for a PR:
  - Added/removed/changed commands.
  - Changed flags and JSON fields.
  - Updated scripts/workflows that use them.
- Harness surfaces this as a 1–2 page "contract diff" for humans and agents.
Pipeline maps as first-class context
- Maintain a simple map: which commands/services form each named flow.
- Bridge gives agents this map and marks which steps are hot paths or high risk.

Required harness-flow redesign

Sidecar agent loop over CLIs
- Agents work in a sidecar terminal:
  - Propose/modify code.
  - Run CLI commands.
  - Capture stdout/JSON and summarize.
- All steps logged as a short, replayable script.
Script- and contract-centric PR artifacts
- For each PR, harness auto-generates:
  - script_card.md: key commands and example invocations affected.
  - contract_card.md: CLI/API shape changes, with before/after samples.
  - risk_lane: inferred lane (local_impl, cli_contract_change, etc.).
Simple verification recipes per lane
- local_impl: run unit tests for the service; quick smoke CLI.
- cli_contract_change: run a small scenario suite (3–5 canned inputs) and verify JSON diffs.
- cross_service_pipe: run end-to-end pipeline script with test data; check invariants.

How to test this inversion on a real team

Choose a contained slice
- Pick 1–2 flows inside an existing monolith (e.g., reporting or ETL) and peel them into:
  - 2–4 small services with clear CLIs.
  - 1–2 pipeline scripts.
Run an A/B on review and learning
- For 6–8 weeks, compare:
  - Monolith changes vs CLI/service changes on:
    - Review time per PR.
    - Number of review comments on behavior/contract vs style.
    - Defects per 100 PRs for that flow.
  - Apprentice learning:
    - Time for a junior to complete a small change solo.
    - Their ability to explain the flow after.
Keep agent capabilities symmetric
- Same models, same reviewers.
- Only change: monolith vs CLI/service architecture + harness/bridge tweaks.

When this likely fails

Services expose wide, bespoke APIs instead of narrow CLIs.
There is no shared pipeline map; flows are ad-hoc cross-service calls.
Harness cannot reliably run commands in realistic environments.
Team lacks capacity to keep contracts and docs accurate.

Under the right conditions—narrow, stable CLIs; good pipeline maps; contract-first context bridges; script- and contract-centric harness flows—a Unix-like, service-y stack can tighten review focus and make agent behavior and apprenticeship more transparent than a large monolith.