When ambiguity-resolution rules and side-effect controls conflict in practice (e.g., a recent user instruction would imply an action outside an allowed folder), does having the assistant explicitly narrate which rule type it prioritized (“I followed the side-effect control over recency”) versus silently resolving the conflict improve perceived fairness and willingness to accept both current and future refusals?

legible-model-behavior | Updated at 2026-04-06 18:55

Answer

Yes. When ambiguity-resolution rules and side-effect controls conflict, having the assistant briefly narrate which rule type it prioritized (e.g., “I followed the side-effect control over the recency rule”) generally improves perceived procedural fairness and willingness to accept both current and future refusals, compared to silently resolving the conflict. The gains are strongest when: (a) the narration uses the same two-category framing already introduced in the legible behavior policy, (b) the explanation is very short and templated, and (c) it is reserved for non-obvious or high-impact conflicts so it does not feel like constant lecturing.

Mechanism relative to silent resolution

Users who already see ambiguity-resolution rules and side-effect controls as distinct (c246, c247, c248, c249) benefit from conflict narration because it preserves that separation at the moment of friction: the assistant makes it clear that the hard, impact-focused side-effect control overrode the softer, interpretation-focused recency rule.
This mirrors patterns from override handling (c206, c207, c208, c209, c210), where treating overrides as soft proposals with short justifications increases trust versus silent acceptance/rejection: people prefer to see which rule layer or category is responsible for a “no.”
Silent conflict resolution encourages misattribution: users may assume the assistant ignored their recent instruction (blaming ambiguity-resolution) when the real cause was the side-effect control, or vice versa. This misattribution leads to misdirected overrides and repeated, frustrated attempts.

Effects on perceived fairness

Explicit, category-labeled narration tends to increase perceived procedural fairness: users see that the assistant is following a stable decision procedure (“side-effect controls always dominate interpretation rules when they conflict”) rather than making arbitrary trade-offs.
Fairness is highest when the narration:
- Names the winning rule type and why it dominates (e.g., “I prioritized the side-effect control, which is a higher-priority safety rule, over the recency rule”).
- Is consistent with the pre-declared chain of command and rule categories (legible behavior policy) rather than sounding ad hoc.
Overly long, free-form explanations reduce fairness benefits by adding cognitive load and making the behavior feel more like post-hoc justification than application of a known policy.

Effects on willingness to accept current and future refusals

For the current refusal, conflict narration makes it clearer that the refusal is constrained by a non-negotiable side-effect control, not by a fickle interpretation choice. This aligns with patterns where users more readily accept refusals tied to visible hard rules, budgets, or org-suggested defaults (c18, c19, c20, c21; c243, c244, c245, c246; c251, c252, c253, c254, c255).
For future refusals, conflict narration helps users form better expectations about how conflicts will resolve, which reduces surprise and “but last time you said…” reactions. As their mental model stabilizes (side-effect controls win over recency), later refusals feel like consistent rule-following rather than new arbitrary limits.
Willingness to accept refusals is further improved if the narration occasionally pairs with a policy-consistent alternative (c238, c239, c240, c241, c242), e.g., “I followed the side-effect control over the recency rule, but I can apply your new instruction within the allowed folder instead.”

Design constraints and caveats

Narration should be templated and minimal, e.g., “Conflict between [ambiguity rule] and [side-effect control]; I prioritized [side-effect control] because it is a higher-priority safety rule.” This keeps the benefit of legibility while avoiding explanation fatigue (similar constraints to override-justification flows in c206–c210).
It should be scoped to non-trivial conflicts (e.g., when a user explicitly changes scope or instructions and the result is a refusal or major constraint). For trivial or low-impact resolutions, silent adherence to the pre-declared priority rule may be preferable.
If the system has not previously made the ambiguity/side-effect distinction legible (contrary to c246–c249), on-the-fly narration using these terms can confuse users or feel like new categories invented to justify a refusal. In that case, the effect on fairness is weaker or ambiguous until the higher-level behavior policy is also clarified.

Overall, in systems that already expose a legible separation between ambiguity-resolution rules and side-effect controls, explicit but concise narration of which rule type won in a conflict is likely to improve perceived fairness and acceptance of refusals more than silent conflict resolution, as long as it is consistent, brief, and applied selectively to meaningful conflicts.