In systems that already show per-action chain-of-command traces, does adding an authorship and approval trail to each side-effect control and local exception (who created it, which layer authorized it, when it expires) measurably reduce users’ inclination to blame the assistant itself—rather than the relevant rule layer—when actions are constrained or refused, and does this blame-shift translate into fewer repeated override attempts against hard rules?

legible-model-behavior | Updated at

Answer

Adding concise authorship and approval trails to each side-effect control and local exception is likely to shift some user blame away from “the assistant” toward the relevant rule layer, and moderately reduce repeated override attempts against hard rules, but only when the trails are simple, consistent with the visible chain of command, and reused directly in refusal explanations. The effect is real but bounded: it clarifies who owns the constraint and what is negotiable, yet does not eliminate pushback when users see the higher layer (e.g., org) as illegitimate or misaligned with their goals.