When assistants expose a user-visible ambiguity budget tied to the chain of command (e.g., “for low‑risk tasks I will guess up to X, for higher-risk tasks I must always ask”), which concrete ambiguity-resolution failures (acting when users expected a clarification, or deferring when users expected initiative) most undermine trust and override acceptance, and can a per‑incident ‘what I was unsure about’ trace repair that damage better than static policy text alone?

legible-model-behavior | Updated at

Answer

Most damaging failures are (1) high‑risk over‑initiative that users thought was outside the ambiguity budget, and (2) repeated low‑risk over‑deferral that contradicts the advertised willingness to guess. A per‑incident “what I was unsure about” trace can partly repair trust and override acceptance if it (a) is short, (b) explicitly references the ambiguity budget and risk band, and (c) appears immediately with a concrete fix, but it cannot compensate for frequent or severe violations of the published budget.

Key failures

  • High‑risk over‑initiative

    • Acting without clarification in tasks users intuit as “high‑risk” (money, external sends, destructive edits) when the policy says “must always ask.”
    • Most harmful when framed as: “I wasn’t sure about X but went ahead anyway,” especially if side‑effect controls were bypassed.
  • Low‑risk over‑deferral

    • Asking clarifying questions on routine, low‑impact tasks where the policy promises some guessing (e.g., small edits, draft suggestions).
    • Most harmful when repeated, because it makes the ambiguity budget feel fake and the assistant feel needy or obstructive.
  • Inconsistent use of risk bands

    • Treating similar actions in the same risk band differently (sometimes guessing, sometimes deferring) without a clear policy reason.
    • Users generalize this as unreliability and stop trusting the written budget.

Effect of per‑incident “what I was unsure about” traces

  • Helps when:

    • The failure is borderline (e.g., medium‑risk, ambiguous context) and the trace explains: “Uncertain about X and Y; policy says ask at this risk band, so I asked.”
    • The assistant proposes a quick remedy: “Next time, treat this specific pattern as low‑risk unless amount > $N; confirm?”
    • It reuses labels from the legible behavior policy (risk band, ambiguity budget, side‑effect controls, chain of command), reinforcing that the miss was an edge case, not arbitrary.
  • Limited or harmful when:

    • The trace reveals clear policy violation: “Policy says ask, but I guessed anyway.” Users see this as evidence the ambiguity budget is not enforced.
    • Traces are long, frequent, or technical, turning every decision into a post‑mortem and increasing cognitive load.
    • The explanation conflicts with visible outcomes (e.g., claims high uncertainty but took a very high‑impact action).

Relative to static policy text

  • Static text sets expectations but does little to explain concrete misses.
  • Per‑incident traces are better at local repair (single incident) and teaching users how the ambiguity budget is actually applied.
  • Trust and override acceptance are highest when:
    • Static policy: small, clear mapping of risk bands → ambiguity budget.
    • Runtime: short traces only on non‑trivial or surprising actions, always referencing the same budget/risk vocabulary.
    • Repeated misses trigger visible policy or model updates, not just explanations.

Net: the worst failures are visible violations of “must ask” in high‑risk contexts and habitual deferring where initiative was promised. Incident‑level uncertainty traces can meaningfully soften borderline cases and clarify intent but cannot repair systematic divergence between behavior and the advertised ambiguity budget.