In systems that already provide compact, user-visible behavior policies, what is the impact on trust and override success of adding an on-demand, per-decision ‘policy trace’ (showing exactly which rules and defaults fired) versus relying only on static policy text and brief refusal/ambiguity rationales?

legible-model-behavior | Updated at 2026-04-06 18:40

Answer

Adding an on-demand, per-decision policy trace tends to modestly improve trust calibration and override success for a subset of motivated users, but it does not universally increase headline trust and can even reduce perceived usability if overused or too detailed. Compared with relying only on compact static policy text plus brief refusal/ambiguity rationales, policy traces work best as an expert or “inspect deeper” feature, tightly summarized and clearly mapped onto the existing legible behavior policy.

Impacts compared to static policy + brief rationales only:

Trust
- Overall trust levels change only slightly; what improves more reliably is trust calibration:
  - Detail‑seeking or high‑stakes users gain confidence when they can verify that refusals and constraints follow the published chain of command and hard‑rule vs default distinctions.
  - Casual or low‑patience users often ignore traces; if traces are verbose or intrusive by default, they experience lower usability and may report slightly lower trust due to overload or perceived bureaucracy.
- Trust is strengthened in edge cases where static rationales feel too generic; a short trace that names the specific hard rule or default that fired reduces suspicion of arbitrary behavior and aligns with prior results on visible chains of command and rule labeling.
Override success and behavior
- For users who open them, traces increase the precision and eventual success of overrides:
  - Users can see which default or local policy actually drove the behavior and target that setting, instead of repeatedly pushing against non‑overridable hard rules or misidentified knobs.
  - This mirrors the benefits seen when chains of command, side‑effect budgets, and labeled defaults are exposed: users redirect override attempts toward genuinely tunable controls and away from hard rules.
- Poorly designed traces (e.g., long rule lists without salience ordering) can overwhelm users and reduce override attempts overall, including potentially beneficial ones, because the system feels too complex to steer.
Perceived fairness and consistency
- When traces are consistent with the static behavior policy and reuse its names for rule layers and defaults, they tend to increase perceived procedural fairness, especially after contentious refusals (“show me exactly why this was blocked”). Users are more willing to accept decisions when they see a concrete path from input → rule evaluation → outcome.
- If traces occasionally expose behavior that appears to contradict the compact policy (e.g., an undocumented heuristic or shortcut rule), they can amplify trust damage relative to a world with no traces, because the inconsistency is now legible rather than hidden.

Design implications:

Make policy traces on‑demand, not default, and summarize them in 1–3 labeled steps tied to existing policy concepts (hard rules, org‑suggested defaults, personal defaults, side‑effect limits).
Use traces most prominently in refusals, constrained actions, and surprising ambiguity resolutions, where additional transparency is most likely to convert frustration into acceptance or correctly aimed overrides.
Regularly audit traces against the legible behavior policy so that, from the user’s perspective, traces never reveal “secret rules” but only instantiate the already‑promised contract in more detail.