When assistants expose a user-visible ambiguity budget alongside hard rules and behavioral defaults, does explicitly showing when an action is blocked due to uncertainty limits ("exceeded your ambiguity budget") versus safety limits ("blocked by side-effect control / hard rule") reduce user miscalibration about what can be overridden, and lower repeated override attempts on structurally unsafe actions?
legible-model-behavior | Updated at
Answer
Yes, exposing a user-visible ambiguity budget and clearly distinguishing “blocked by uncertainty limits” from “blocked by safety limits / hard rules” is likely to reduce miscalibration about what can be overridden and to lower repeated override attempts on structurally unsafe actions—provided the labels are simple, reused everywhere refusals appear, and accurately reflect the true constraint.
Mechanism in brief:
- When a refusal is tagged as ambiguity-budget-limited, users learn that they can often resolve it by clarifying or relaxing their ambiguity preferences.
- When a refusal is tagged as hard-rule / side-effect-control-limited, users learn that attempts to override are structurally futile and should instead shift to scoped, policy-consistent variants (or escalation to a higher authority if available).
- This mirrors prior findings that labeling rule layers, default origins, and exception scopes reduces misdirected overrides and makes refusals feel more procedurally fair.
So, explicit differentiation between uncertainty limits and safety limits should modestly improve calibration and reduce repeated override attempts on unsafe actions, as long as the UI doesn’t blur these categories or suggest that hard rules are negotiable when they are not.