When an assistant lets users pre-select a named behavior profile (e.g., “strict org-compliant”, “helpful but cautious”, “maximal initiative within side-effect controls”) that is explicitly mapped onto the same underlying chain of command and hard-rule layers, how does profile selection—versus per-request ad hoc overrides—change users’ trust, override frequency, and perceived fairness when actions are constrained or refused?

legible-model-behavior | Updated at 2026-04-06 19:28

Answer

Allowing users to pre-select a named behavior profile that is clearly described as operating under the same chain of command and hard rules tends to increase baseline trust and perceived fairness and to reduce noisy, per-request override attempts, while concentrating more of the remaining override activity into deliberate profile changes and a smaller number of high-salience disputes.

Trust

Pre-selected behavior profiles create an upfront, shared expectation about interpretation style and conservativeness (e.g., “strict org-compliant” vs. “maximal initiative within side-effect controls”), so later refusals feel like consistent applications of an agreed profile rather than arbitrary one-off decisions.
Because profiles are explicitly mapped to the same underlying hard-rule and side-effect control layers, users learn that switching profiles changes defaults and initiative but not the non-negotiable constraints; this reduces the “fake control” dynamic where users believe they can escape hard rules via overrides.
Trust is highest when each profile’s description reuses the same labels as the legible behavior policy (e.g., risk bands, side-effect controls, hard rule vs. default), and refusals explicitly reference both the governing rule layer and the currently active profile.

Override frequency and pattern

Compared to ad hoc per-request overrides, profile selection generally lowers the total number of override attempts in routine work:
- Many minor preference conflicts (e.g., initiative level, how aggressively to suggest alternatives, how cautiously to execute within side-effect limits) are pre-resolved by choosing a profile, reducing the need for repeated, local overrides.
- Users who would otherwise keep pushing for more initiative or more caution per request instead experiment with switching profiles, which is a more efficient and transparent form of override handling.
The overrides that do occur tend to be more structured:
- Users increasingly treat profile changes as the primary way to adjust behavior defaults, and reserve per-request ad hoc overrides for genuine exceptions or one-off edge cases.
- When refusals hit non-overridable hard rules or side-effect controls, users who understand that all profiles share these layers make fewer futile override attempts aimed at “softening” those rules and instead focus on reformulating within allowed scopes.

Perceived fairness when constrained or refused

Refusals that cite both the active behavior profile and a higher chain-of-command layer (e.g., “Under ‘maximal initiative within side-effect controls’ I still must follow this org hard rule and high-impact risk band”) are more often judged procedurally fair than refusals framed only as generic safety or assistant preference.
Users see the profile as a choice they made within a transparent rule framework, so constraints feel like consistent rule-following rather than the assistant arbitrarily shifting stance between requests.
Fairness is further improved when the assistant, at refusal time, can point to profile-appropriate alternatives (e.g., within a cautious profile, suggesting safer scopes or time-bounded variants) so that users experience constraints and options as coherent with their chosen profile.

Net effect

Profile selection tends to:
- Increase baseline trust that the assistant’s refusals and constraints follow a stable, user-chosen style within fixed hard rules.
- Reduce scattered, per-request override attempts, especially those misdirected at non-overridable layers, while shifting adjustment behavior into explicit profile changes.
- Improve perceived fairness, because refusals can be explained as predictable consequences of both the chain of command and the user’s own profile choice, rather than opaque, moment-to-moment policy shifts.
These benefits depend on (a) clearly signaling that profiles do not change hard rules or side-effect controls, (b) making profile descriptions legible and aligned with the behavior policy vocabulary, and (c) reusing profile references in refusal explanations so that users can see the link between their upfront choice and later constraints.