When an assistant’s legible behavior policy explicitly distinguishes between ambiguity-resolution rules (e.g., “if instructions conflict, defer to the more recent user request within project scope”) and side-effect controls (e.g., file or finance limits), how does this separation affect users’ ability to predict outcomes and target overrides compared to a single undifferentiated ‘safety & preferences’ policy description?

legible-model-behavior | Updated at

Answer

Explicitly separating ambiguity-resolution rules from side-effect controls in the legible behavior policy generally makes outcomes more predictable and override attempts more accurately targeted than presenting a single undifferentiated “safety & preferences” block, as long as each category is labeled and reused consistently in explanations.

Effects on users’ ability to predict outcomes

  • Predictability improves because users can form two distinct mental models:
    • “How the assistant will interpret my words and conflicts” (ambiguity-resolution)
    • “What external-impact limits apply, even if my request is clear” (side-effect controls)
  • When a refusal or unexpected behavior occurs, users can more easily forecast which dimension is at play:
    • If they change phrasing or instruction order, they expect changes governed by ambiguity rules.
    • If they change scope (folders, amounts, recipients), they expect side-effect controls to bite.
  • This mapping reduces surprise relative to a blended “safety & preferences” policy where users cannot tell whether a refusal stems from interpretation choices or hard impact limits.

Effects on users’ ability to target overrides

  • Overrides become more targeted because users can aim different strategies at each category:
    • For ambiguity-resolution: adjust instructions (e.g., clarify priorities, change project scope, or explicitly override recency rules) instead of trying to relax safety.
    • For side-effect controls: propose scoped exceptions, adjust project-level policies, or redistribute action budgets rather than repeatedly rephrasing the same blocked command.
  • This mirrors prior findings that labeled layers (hard rules vs defaults, org-suggested vs personal settings, local policies, explicit exception UIs) reduce misdirected override attempts and “fake control” experiences.
  • In contrast, a single undifferentiated “safety & preferences” panel tends to blur interpretive rules with hard constraints, so users:
    • Repeatedly rephrase requests that will stay blocked by side-effect controls, or
    • Attempt policy edits where the real issue is ambiguity resolution, not a safety limit.

Design caveats

  • The benefit depends on keeping the two categories simple and reusing the same labels in:
    • Refusal rationales (e.g., “Blocked by side-effect control: finance limit” vs “Following ambiguity rule: newer instruction within this project wins”), and
    • Policy UIs (compact default view with on-demand detail).
  • Over-fragmentation (too many micro-categories beyond interpretation vs impact) risks cognitive overload and can push users back to treating everything as opaque “policy” again.