When explaining a refusal, how do three designs—(a) naming only the highest blocking hard rule, (b) showing a full rule stack trace across the chain of command, or (c) grouping rules into 2–3 labeled tiers (e.g., “org compliance”, “project defaults”, “your ad‑hoc request”)—compare in terms of user trust, perceived fairness, and ability to correctly predict future assistant behavior?

legible-model-behavior | Updated at 2026-04-06 18:38

Answer

Design (c)—grouping rules into 2–3 labeled tiers—tends to give the best overall balance for trust, perceived fairness, and users’ ability to predict future behavior, with (a) as a reasonable compact default and (b) reserved for on-demand detail.

Comparative effects:

(a) Naming only the highest blocking hard rule

User trust
- Keeps explanations short and avoids cognitive overload, which supports trust in fast, low-stakes workflows.
- However, by hiding lower layers (e.g., project defaults, user preferences), it weakens the sense that the assistant is honoring the full legible behavior policy; users may attribute refusals to a monolithic “system rule” rather than a principled chain of command.
Perceived fairness
- Reasonable procedural fairness when the blocking rule is clearly labeled as non-negotiable (e.g., “org compliance rule X”).
- But feels binary and opaque in edge cases: users cannot see whether their local policies or overrides were even considered (similar to the “fake control” risk in layered policies and overrides).
Predicting future behavior
- Users can anticipate that any similar request will be blocked by that same named rule, but they struggle to predict behavior when context changes (different project, different side-effect scope) because they lack a mental model of the underlying layers.

(b) Full rule stack trace across the chain of command

User trust
- Increases trust among expert or highly impacted users (e.g., admins, compliance officers) who want proof that the assistant followed the chain of command exactly.
- For most users and routine tasks, the volume of detail feels bureaucratic and can reduce trust in the assistant’s usability, even if they accept it is following rules correctly.
Perceived fairness
- High procedural fairness in contested or surprising refusals: users can see which layer owned each constraint and that their request traveled through the same pipeline as others.
- However, always-on stack traces impose cognitive cost and may make users feel constantly scrutinized or second-guessed, which can emotionally reduce perceived fairness despite formal transparency.
Predicting future behavior
- Strongest raw predictive power for users who engage with the trace: they can generalize from the ordered layers to new situations and aim overrides more accurately (aligning with benefits from detailed policy views and exception UIs).
- In practice, many users skim or ignore long traces, so realized predictive gains are smaller than the theoretical maximum.

User trust
- Typically highest overall: users see that refusals come from a structured hierarchy (e.g., “org compliance” > “project policy” > “your request”) without being flooded with per-rule detail.
- This matches the benefits of a compact behavior-policy view (clear hard vs default separation) while still making the chain of command legible enough for users to treat it as a reliable contract.
Perceived fairness
- Strong sense of procedural fairness: the same visible tiers are reused across refusals, override handling, and side-effect controls, so users can attribute outcomes to stable layers rather than ad-hoc choices.
- When a refusal names both the blocking tier and the next tier down that was honored (e.g., “Within your project defaults I optimized for speed, but this action is blocked by org compliance”), users more readily accept hard-rule constraints, as seen with layered policies and temporary exceptions.
Predicting future behavior
- Better than (a) for most users: they learn generic rules like “org compliance always wins” and “project defaults apply unless they conflict with org rules,” which transfer across tasks and contexts.
- Slightly weaker than (b) for expert users who want to model specific rule IDs, but more effective in aggregate because more users actually process and remember a 2–3-tier schema.

Design implication

Use (c) as the default refusal explanation pattern: show the responsible tier and how it sits in the chain of command, optionally mentioning one or two key rules within that tier in plain language.
Offer (b) as an on-demand “view details” expansion for users who contest a decision or need auditability.
Fall back to (a)-style single-rule naming only when UI space or interaction bandwidth is extremely constrained, and still keep the tier label (“Blocked by org-compliance hard rule: X”) to maintain the mental model.

This tiered-default plus expandable-detail approach mirrors the compact vs detailed policy views in prior findings: routine work stays fast and low-friction, while users who care about fairness, overrides, or audit trails can access full traces, and the shared tier labels help everyone better predict future assistant behavior.