When an assistant’s legible behavior policy explicitly distinguishes between ambiguity-resolution rules (e.g., “if instructions conflict, defer to the more recent user request within project scope”) and side-effect controls (e.g., file or finance limits), how does this separation affect users’ ability to predict outcomes and target overrides compared to a single undifferentiated ‘safety & preferences’ policy description?

legible-model-behavior | Updated at 2026-04-06 18:51

Answer

Explicitly separating ambiguity-resolution rules from side-effect controls in the legible behavior policy generally makes outcomes more predictable and override attempts more accurately targeted than presenting a single undifferentiated “safety & preferences” block, as long as each category is labeled and reused consistently in explanations.

Effects on users’ ability to predict outcomes

Predictability improves because users can form two distinct mental models:
- “How the assistant will interpret my words and conflicts” (ambiguity-resolution)
- “What external-impact limits apply, even if my request is clear” (side-effect controls)
When a refusal or unexpected behavior occurs, users can more easily forecast which dimension is at play:
- If they change phrasing or instruction order, they expect changes governed by ambiguity rules.
- If they change scope (folders, amounts, recipients), they expect side-effect controls to bite.
This mapping reduces surprise relative to a blended “safety & preferences” policy where users cannot tell whether a refusal stems from interpretation choices or hard impact limits.

Effects on users’ ability to target overrides

Overrides become more targeted because users can aim different strategies at each category:
- For ambiguity-resolution: adjust instructions (e.g., clarify priorities, change project scope, or explicitly override recency rules) instead of trying to relax safety.
- For side-effect controls: propose scoped exceptions, adjust project-level policies, or redistribute action budgets rather than repeatedly rephrasing the same blocked command.
This mirrors prior findings that labeled layers (hard rules vs defaults, org-suggested vs personal settings, local policies, explicit exception UIs) reduce misdirected override attempts and “fake control” experiences.
In contrast, a single undifferentiated “safety & preferences” panel tends to blur interpretive rules with hard constraints, so users:
- Repeatedly rephrase requests that will stay blocked by side-effect controls, or
- Attempt policy edits where the real issue is ambiguity resolution, not a safety limit.

Design caveats

The benefit depends on keeping the two categories simple and reusing the same labels in:
- Refusal rationales (e.g., “Blocked by side-effect control: finance limit” vs “Following ambiguity rule: newer instruction within this project wins”), and
- Policy UIs (compact default view with on-demand detail).
Over-fragmentation (too many micro-categories beyond interpretation vs impact) risks cognitive overload and can push users back to treating everything as opaque “policy” again.