In workflows where multiple users share an assistant with a visible chain of command, how does showing per-action indicators of whose instruction or local exception currently dominates (e.g., “following project owner’s rule” vs “following your personal default”) affect individual users’ perceived fairness and their willingness to respect refusals, compared to showing only impersonal rule-layer labels (e.g., ‘project policy’, ‘org hard rule’) without attribution to a specific actor?

legible-model-behavior | Updated at 2026-04-06 19:32

Answer

Showing per-action indicators of whose instruction or local exception dominates generally increases perceived fairness and willingness to respect refusals compared to impersonal rule-layer labels alone—up to the point where attribution becomes socially fraught or blamey. The best outcomes come from actor-attribution that is (a) coarse-grained (roles or “owner” labels, not detailed names), (b) clearly mapped to the visible chain of command, and (c) paired with stable, rule-layer labels.

Effects on perceived fairness

Users more often judge outcomes as procedurally fair when they can see both the rule layer and whose instruction or local exception is being applied (e.g., “project policy → set by project owner”). This mirrors the fairness benefits of origin labels and authorship trails for defaults, which make constraints feel deliberate and owned rather than arbitrary (c328, c329, c331).
Actor-level attribution shifts some frustration away from the assistant (“it’s being arbitrary”) toward the appropriate place in the chain of command (“this is the owner’s rule, not the model’s whim”), which aligns with how org-suggested defaults and provenance displays help users attribute refusals to the correct layer (c329, c331, c339).
Fairness perceptions improve most for users whose own preferences are being overridden by higher-layer project or org rules: seeing “following project owner’s exception” instead of a generic “project policy” makes it clear that their local intent is subordinate but still recognized, similar to how separating hard rules from tunable defaults clarifies where change is possible (c2a230eae-1, c6bf58b3d-1).

Effects on willingness to respect refusals

Users are more willing to accept refusals when they see that the assistant is consistently applying another actor’s rule within the visible chain of command rather than silently ignoring their request. This parallels the trust gains from justified override rejections and explicit rule ownership (c0d07e73b-1, c328, c329).
Per-action “whose rule” indicators help users redirect override or negotiation attempts toward the right party (e.g., asking the project owner to change a local exception) instead of repeatedly pressing the assistant, much like how origin labels on defaults reduce misdirected overrides (c2a230eae-2, c2a230eae-5).
Relative to impersonal rule-layer labels, actor-attributed labels slightly reduce repeated override attempts against refusals grounded in higher-layer project policies, because users understand that the assistant itself lacks discretion and that another identifiable party would need to change the rule (c2a230eae-1, c3fc2096b-2).

Boundary conditions and risks

Attribution that is too granular (e.g., naming specific individuals for every small exception) can backfire: it feels like the assistant is “snitching” or assigning blame, which may reduce willingness to engage with the behavior policy and can create interpersonal tension. Coarse roles (“project owner”, “team lead”, “security team”) are safer.
If “whose rule” indicators are inconsistent with the chain of command UI or provenance trails (e.g., the per-action banner says “following your default” but the settings view shows it as org-suggested), they undermine trust in the entire legible behavior policy, similar to other visible-policy mismatches (c3fc2096b-5, c a8327cae-1, c a8327cae-5).
In high-stakes or politically sensitive contexts, systems may need a compact, role-based attribution that still clarifies ownership without exposing unnecessary personal detail, paralleling the design constraints on authorship and approval trails (c328, c331).

Design implication

The most effective pattern is a layered indicator such as: “Project policy (set by project owner) → blocking this action; your personal default is still recorded but overridden here.” This reuses existing rule-layer labels, adds minimal actor attribution, and clearly explains why the refusal occurs and where change would have to happen, improving both perceived fairness and compliance with refusals compared to layer-only labels.