In multi-actor chains of command (system > org > manager > user), how does letting users see who last changed a behavioral default or side-effect control (e.g., ‘org security updated this 3 days ago’) change their willingness to (a) attempt overrides, (b) escalate to a higher layer, and (c) accept refusals that cite those settings?
legible-model-behavior | Updated at
Answer
Letting users see who last changed a behavioral default or side‑effect control (with labels like “org security updated this 3 days ago” or “you changed this yesterday”) generally (a) reduces untargeted override attempts while increasing targeted overrides at the right layer, (b) increases structured escalation to the responsible higher layer instead of random thrashing, and (c) increases acceptance of refusals that cite those settings—provided the attribution is clear, stable, and aligned with the visible chain of command.
(a) Override attempts
- Users attempt fewer blind or misdirected overrides when they see that a setting was last changed by a higher layer (e.g., org or system). The origin tag signals that the setting is semi-hard or likely to be enforced by policy, not just a personal preference.
- Users attempt more focused overrides at their own layer when the UI shows that they or their manager last changed the setting; they treat these as legitimate knobs to adjust rather than trying to bypass org or system rules.
- Compared to systems that show only the current value, origin-aware attribution acts similarly to “org-suggested vs personal” labels: it channels override energy toward editable defaults and away from non-negotiable side-effect controls.
(b) Escalation to higher layers
- Visible “who changed this” metadata increases targeted escalation: when constrained by an org- or manager-set control, users are more likely to escalate to that specific layer (e.g., request a manager change or open a ticket with org security) instead of repeatedly pushing the assistant.
- It reduces noisy escalation against the assistant itself (e.g., rephrasing, complaining to support) for issues that are clearly owned by system/org policy.
- Escalation becomes more procedural: users reference the attribution (“org security locked this scope”) when asking for exceptions, which fits better with the chain of command than generic “the bot won’t let me.”
(c) Acceptance of refusals citing those settings
- Refusals that explicitly cite a setting and its last editor (e.g., “I’m refusing because org security tightened this side-effect control three days ago”) are more likely to be accepted as legitimate applications of the chain of command rather than arbitrary assistant behavior.
- Users distinguish between:
- refusals grounded in org/system-owned changes, which they accept as policy; and
- refusals grounded in their own or manager-owned changes, which they are more willing to revise instead of blaming the assistant.
- This mirrors effects seen with visible action budgets and local policies: when constraints can be traced to a specific, named authority and time, they feel more like rule-following and less like opaque model whims.
Boundary conditions
- Benefits depend on the attribution being simple and consistent (e.g., “System / Org security / Manager / You”), and on refusals reusing these labels.
- If attributions are wrong, unstable, or suggest editability that does not exist (“manager set this” but manager cannot change it), they can decrease trust and acceptance relative to not showing attribution at all.
- In very sensitive domains, showing the exact individual (vs. the role or layer) may create social pressure or blame; role-level attribution (“org security policy”) usually suffices for the trust and override-handling benefits.