If an assistant lets users mark specific instructions as “strong preferences” versus “soft suggestions,” and visibly maps these onto its ambiguity-resolution rules within the chain of command, does this labeling reduce perceived unfairness when the assistant must still follow a conflicting hard rule or organizational default, compared to a system that treats all user instructions as having the same status?

legible-model-behavior | Updated at 2026-04-06 18:53

Answer

Yes. Letting users explicitly label instructions as “strong preferences” vs “soft suggestions,” and showing how these slots fit into the chain of command and ambiguity‑resolution rules, generally reduces perceived unfairness when a hard rule or organizational default still wins, compared to treating all user instructions as having the same status.

Mechanism relative to a flat “all instructions equal” model:

Users gain a clearer mental model of where their inputs sit in the hierarchy: they see that even “strong preferences” are below hard rules and org defaults, while “soft suggestions” sit below both. This mirrors benefits from visible layering of local policies and defaults under hard rules (c28, c29, c30, c31).
When a refusal occurs, the assistant can say, e.g., “I applied your strong preference, but it conflicts with the org hard rule at this step, so I have to follow that rule.” This makes the outcome feel like consistent chain‑of‑command following rather than arbitrary override of the user, similar to labeled defaults and override‑as‑proposal patterns (c32, c195, c196, c197).
Because users chose the label themselves, they are less likely to experience bait‑and‑switch: the system is not pretending that a “strong preference” is an override of hard rules, only that it will dominate other user-level instructions and ambiguity, which it can reliably do.

Comparative effects on perceived unfairness:

In a flat model where all instructions are treated as equivalent and the hierarchy is implicit, refusals that cite hard rules can feel unpredictable or selectively enforced; users have no visible explanation for why this instruction lost. That pattern is associated with lower fairness and more override frustration in related settings (c28, c29, c39, c40).
With labeled preference strength plus visible mapping into ambiguity-resolution rules, users can distinguish between:
- cases where the assistant ignored their labeling (a potential bug or fairness issue), and
- cases where the label was honored among user-level instructions but legitimately lost to a higher layer. This attribution shift typically improves procedural fairness judgments.

Design constraints for the benefit to hold:

The UI must clearly show that “strong preference” ≠ “hard rule override,” and refusals should reuse those labels in short, standardized rationales (e.g., “Blocked by org default X, which is above your strong preference in the chain of command”).
The assistant must reliably honor strong vs soft distinctions within the user layer (e.g., when two user instructions conflict and no hard rule is involved); otherwise, the labels become fake control and can increase perceived unfairness beyond a simpler flat model.