When long-running agents are allowed to perform multi-hour code refactors and simulation campaigns under principal-style contracts, what combinations of dynamic, risk-sensitive checkpointing (based on online risk signals) and compute-budget governance (start/stop/fork policies tied to those same signals) most reduce end-to-end silent-error rates per unit of human and compute budget, and how do these joint policies trade off between catching gradual, low-signal drifts and containing high-risk but rare catastrophic failures?

anthropic-scientific-computing | Updated at

Answer

Best-effort answer:

  1. Most effective joint pattern (for multi-hour refactors/sim runs)
  • Use a small set of online risk signals (e.g., contract-touch fraction, cross-artifact/claim inconsistencies, anomaly in resource use, recent test failures).
  • Map these signals to 3 risk bands (low/medium/high) with hysteresis (to avoid flapping).
  • For each band, couple:
    • checkpoint density (when to verify, when to ask for human input), and
    • compute actions (continue, throttle, fork, or halt).
  1. Example policy grid (sketch)
  • Low risk
    • Checkpoints: coarse, mostly automated tests + a few golden cases.
    • Compute: full speed; no forking; minimal human review.
  • Medium risk
    • Checkpoints: denser; run contract/golden suites; occasional self-adversarial probes on touched modules.
    • Compute: throttled; allow short forks for A/B comparisons; require human review for large API/schema or physics-model changes.
  • High risk
    • Checkpoints: immediate full contract/golden suite + targeted adversarial checks.
    • Compute: pause main run; spawn small, capped forks to diagnose (rollback candidate, alt implementation, replay with more logging); resume only if forks agree and tests pass; otherwise require human decision.
  1. Where this combo beats either mechanism alone
  • Relative to dynamic checkpointing alone (cf. f7156ab6):
    • Compute governance limits blast radius when risk spikes (no long, high-risk tail of silent errors).
    • Forks let you use disagreement between branches as an extra error signal under the same total budget.
  • Relative to compute-governance alone (cf. 8214a430):
    • Risk-tuned checkpoints give the trust signals more bite: high-risk periods see stronger tests, not just slower or halted compute.
  1. Drift vs catastrophic failures (high-level tradeoff)
  • To catch gradual, low-signal drift:
    • Maintain a thin floor of fixed, time/step-based checkpoints (golden cases + cheap schema/API checks) even in low-risk band.
    • Periodically sample extra checks on “boring” intervals, regardless of risk score.
    • This sacrifices some compute/human budget but reduces the class of drifts that never cross a dynamic threshold.
  • To contain rare catastrophic failures:
    • Make high-risk transition thresholds conservative; when they trigger, couple:
      • aggressive checkpointing (heavy tests, self-adversarial verification on changed hotspots), and
      • strong compute actions (pause + fork under small caps).
    • This raises the chance that catastrophic bugs either crash fast under tests or show fork disagreement before consuming large compute.
  1. Simple practical recipe
  • Fix: (a) a minimal baseline schedule, and (b) a 3-band risk→(checkpoint, compute) policy.
  • Tune:
    • Risk signals: start with contract-touch fraction, cross-artifact diff metrics, and recent failure history.
    • Band thresholds: using pilot runs, aim for most work in low band, short bursts in high band.
    • Fork policy: when risk first jumps to high, create 1–2 short forks (rollback vs current vs alt patch) and gate further compute on their agreement.
  1. Outcome pattern (conjectured)
  • Silent-error rate per unit human+compute budget drops most when:
    • Risk signals have at least modest predictive power for regressions.
    • High-risk bands are rare but trigger strong tests + compute caps.
    • A nonzero fixed baseline of checks exists to catch slow drifts.
  • Residual errors skew toward:
    • global modeling mistakes that remain internally consistent, and
    • drifts so low-signal that they never affect the chosen risk metrics and slip through baseline checks.