Most current patterns assume that opinionated, token-efficient monolith stacks are the natural home for agent-first workflows; under what conditions would a more Unix-like, CLI-substrate–heavy, service-y architecture actually reduce review bottlenecks and apprenticeship decay—by making interfaces more human- and agent-verifiable—and what concrete redesigns of context bridges and harness flows would be required to test this inversion in a real team?

dhh-agent-first-software-craft | Updated at

Answer

Conditions and a testable redesign where a Unix-y, CLI-heavy, service-y setup can beat monoliths on review and apprenticeship:

  1. Conditions where Unix-y/service-y helps
  • Services are thin, stable, text-first interfaces

    • Each service or CLI exposes a small, well-documented contract (flags, JSON I/O, exit codes).
    • Most behavior is scriptable via the CLI substrate, not bespoke HTTP/SDKs.
  • Context per change is smaller than in the monolith

    • A typical PR touches one service/CLI plus glue, not 8 modules across the monolith.
    • Ownership and on-call map clearly to services/CLIs.
  • Harness can drive CLIs reliably

    • Agent harness runs commands, captures stdout/stderr, and diffs outputs.
    • Common flows become small shell/test recipes agents can rerun and extend.
  • Interfaces are more stable than internals

    • Internal churn is high; interfaces stay narrow and versioned.
    • Humans and agents reason mostly in terms of commands and contracts.
  • Team already thinks in pipelines

    • People design systems as composable steps ("ingest → normalize → score → notify").
    • Unix-style composition feels natural, not bolted on.
  1. How this can reduce review bottlenecks
  • Review focuses on contracts, not all internals

    • Seniors review: new commands, flags, schemas, SLOs, and a few golden-path scripts.
    • Implementation inside a service is mostly local and lower review load.
  • Diff-first, spec-adjacent review

    • Harness attaches for each PR:
      • CLI/API contract diff (new/changed commands, JSON shapes, error codes).
      • Before/after sample runs (inputs/outputs) for key commands.
    • Reviewers skim contracts and examples instead of large code diffs.
  • Lane-based risk routing

    • Lanes by interface impact:
      • local_impl: internal-only change; light review.
      • cli_contract_change: senior review + scenario runs.
      • cross_service_pipe: extra checks on end-to-end scripts.
    • This keeps the review bottleneck on the few PRs that change interfaces.
  1. How this can reduce apprenticeship decay
  • Juniors learn via commands and scripts

    • Onboarding focuses on: "How do I run X flow end-to-end? What does each step guarantee?"
    • They can read and edit pipeline scripts without deep framework expertise.
  • Agents expose more of the reasoning surface

    • Harness cards show: which commands were run, with what flags, and how outputs changed.
    • Juniors see concrete sequences instead of opaque agent magic inside a big monolith.
  • Easier, safer sandboxes

    • CLI flows are easy to run in local/dev with fixture data.
    • Juniors experiment by composing existing commands, then promote successful scripts.
  1. Required context-bridge redesign
  • Interface-first context bridges

    • For each service/CLI, the bridge exposes:
      • Command list, usage, examples.
      • Input/output schemas or JSON samples.
      • SLOs and side-effects.
    • Agents default to reasoning at the command/contract level.
  • Contract-aware diff views

    • Context bridge computes for a PR:
      • Added/removed/changed commands.
      • Changed flags and JSON fields.
      • Updated scripts/workflows that use them.
    • Harness surfaces this as a 1–2 page "contract diff" for humans and agents.
  • Pipeline maps as first-class context

    • Maintain a simple map: which commands/services form each named flow.
    • Bridge gives agents this map and marks which steps are hot paths or high risk.
  1. Required harness-flow redesign
  • Sidecar agent loop over CLIs

    • Agents work in a sidecar terminal:
      • Propose/modify code.
      • Run CLI commands.
      • Capture stdout/JSON and summarize.
    • All steps logged as a short, replayable script.
  • Script- and contract-centric PR artifacts

    • For each PR, harness auto-generates:
      • script_card.md: key commands and example invocations affected.
      • contract_card.md: CLI/API shape changes, with before/after samples.
      • risk_lane: inferred lane (local_impl, cli_contract_change, etc.).
  • Simple verification recipes per lane

    • local_impl: run unit tests for the service; quick smoke CLI.
    • cli_contract_change: run a small scenario suite (3–5 canned inputs) and verify JSON diffs.
    • cross_service_pipe: run end-to-end pipeline script with test data; check invariants.
  1. How to test this inversion on a real team
  • Choose a contained slice

    • Pick 1–2 flows inside an existing monolith (e.g., reporting or ETL) and peel them into:
      • 2–4 small services with clear CLIs.
      • 1–2 pipeline scripts.
  • Run an A/B on review and learning

    • For 6–8 weeks, compare:
      • Monolith changes vs CLI/service changes on:
        • Review time per PR.
        • Number of review comments on behavior/contract vs style.
        • Defects per 100 PRs for that flow.
      • Apprentice learning:
        • Time for a junior to complete a small change solo.
        • Their ability to explain the flow after.
  • Keep agent capabilities symmetric

    • Same models, same reviewers.
    • Only change: monolith vs CLI/service architecture + harness/bridge tweaks.
  1. When this likely fails
  • Services expose wide, bespoke APIs instead of narrow CLIs.
  • There is no shared pipeline map; flows are ad-hoc cross-service calls.
  • Harness cannot reliably run commands in realistic environments.
  • Team lacks capacity to keep contracts and docs accurate.

Under the right conditions—narrow, stable CLIs; good pipeline maps; contract-first context bridges; script- and contract-centric harness flows—a Unix-like, service-y stack can tighten review focus and make agent behavior and apprenticeship more transparent than a large monolith.