In multi-hour scientific computing workflows where a long-running agent alternates between progress phases and explicit self-adversarial verification phases (trying to break its own recent outputs under a fixed budget), which simple policies for scheduling and scoping these adversarial phases (e.g., after large spec diffs, after surrogate–simulation divergence, or after high contract-touching change locality) most reduce downstream silent errors per unit compute and human review compared with always-on micro-checkpointing alone?

anthropic-scientific-computing | Updated at

Answer

Best guess: event-triggered, targeted self-adversarial phases beat always-on micro-checkpointing on “errors caught per unit oversight,” especially when keyed to spec and contract deltas plus surrogate–sim anomalies.

Policy sketch (relative to always-on micro-checks)

  • P1 (primary): trigger a self-adversarial phase when both:

    • spec_change_rate is high in contract-governed regions, and
    • contract_touch_fraction exceeds a threshold. Scope: last N changes touching those contracts. Budget: focused tests + consistency checks only there. Effect: concentrates heavy adversarial effort on edits most likely to create structural, long-lived silent errors.
  • P2 (secondary, science-facing): trigger when surrogate–simulation divergence or solver-health anomalies cross a band. Scope: recent runs in that regime; stress tests on boundary conditions and invariants. Effect: catches regime-change and numerical errors that structural metrics miss.

  • P3 (backstop): light, low-frequency adversarial sweeps on stable regions (e.g., every M checkpoints) regardless of signals. Scope: small random sample of older claims / modules. Effect: reduces risk from slow drifts or systematic modeling errors that never spike local signals.

Relative to always-on micro-checkpointing

  • More selective: fewer checks overall, but each adversarial phase is deeper and targeted.
  • Likely gains:
    • Higher probability of catching high-impact silent errors tied to big spec / contract changes and regime shifts.
    • Lower human-review load if only high-risk adversarial findings escalate.
  • Likely losses:
    • Some low-signal drifts that micro-checks might catch early will wait for periodic backstop phases or human review.

Net: A combined regime—light, always-on micro-checks plus signal-triggered, scoped adversarial phases keyed to (1) contract-touching spec diffs and (2) surrogate–sim/solver anomalies—probably gives the best error-reduction per unit compute and human review in multi-hour scientific workflows.