When assistants surface a compact, always-visible summary of their legible behavior policy (e.g., a 3–5 line “rules & defaults” strip showing the active chain of command and key side-effect controls) during action-taking, does this inline visibility change users’ real-time override behavior and tolerance for refusals compared to keeping the same policy only in a separate settings or documentation view?

legible-model-behavior | Updated at 2026-04-06 18:55

Answer

Yes. Making a compact summary of the legible behavior policy always visible during action-taking does change users’ real-time override behavior and tolerance for refusals compared to keeping the same policy only in a separate settings or documentation view. Inline visibility generally (a) reduces misdirected or trial-and-error overrides and (b) increases immediate tolerance for clearly-attributed refusals—provided the strip is concise, tied to the chain of command and side-effect controls, and reused verbatim in refusal and override messages.

Effects on real-time override behavior

Users issue fewer blind or repeated override attempts when they can see, at the moment of action, which hard rules and key side-effect controls are active (e.g., “Org finance cap: $500/day; File scope: this project only”). This mirrors the planning and reduced wall-testing seen with visible budgets (c190, c191) and with explicit separation of side-effect controls (c243, c244).
Overrides become more targeted to editable defaults rather than to hard rules: when the strip distinguishes hard rules from defaults and shows which layer is in force, users are more likely to adjust local policies, profiles, or scope than to fight system/organization constraints (consistent with c260, c261; c195, c196, c197, c198).
Inline policy cues reduce the need to open separate settings/docs mid-task, lowering friction for legitimate overrides and making the assistant feel more “explainable in place” rather than configurable only out-of-band.

Effects on tolerance for refusals

Tolerance for refusals tends to increase in real time when the refusal can visually and verbally point back to the visible strip (e.g., highlighting “Org rule: cannot send external emails with attachments” that is already in view). This leverages the same attribution and fairness mechanisms seen with visible chains of command and labeled defaults (c195, c196, c197; c243, c245).
Because the strip keeps the chain of command and side-effect controls salient, users more often interpret refusals as consistent rule-following instead of arbitrary or hidden model behavior, which supports acceptance similar to project-level policies and exception flows (c191, c192; c38, c39).
Inline visibility also shortens the explanation needed at refusal time: a brief, standardized rationale that references an element already on-screen can maintain clarity without adding cognitive overload, reinforcing the benefits of concise override-justification patterns (c206, c207, c208).

Key design conditions

The strip must stay compact (3–5 lines) and stable, with clear separation between hard rules and defaults; otherwise it risks becoming noise and may not materially change override behavior.
Labels and terms in the strip should be reused verbatim in refusals, override-handling messages, and any exception UI so that users can build a consistent mental model across interactions.
If the visible summary appears to promise more flexibility than the system actually allows (e.g., implying defaults are editable when they are de facto hard rules), the mismatch can erode trust and reduce tolerance for refusals, similar to the “fake control” effects noted for misleading profiles and local policies (c262, c263; c39).