If behavior profiles are allowed to differ only in ambiguity-resolution defaults (with identical hard rules and side-effect controls), does showing users a brief, profile-specific ‘failure fingerprint’ (e.g., typical refusal/deferral patterns and their main cited rules) reduce perceptions of hidden hard rules and make users more willing to stick with a stricter profile instead of switching away after early refusals?
legible-model-behavior | Updated at
Answer
Yes—when behavior profiles differ only in ambiguity-resolution defaults, showing a short, profile-specific “failure fingerprint” tends to (a) reduce perceptions of hidden hard rules and (b) increase users’ willingness to stick with stricter profiles after early refusals, provided the fingerprint is clearly framed as about defaults, tied to visible rule labels, and kept very concise.
Effects on perceived hidden hard rules
- A failure fingerprint that summarizes “what this profile tends to say no or pause on, and why” helps users attribute early refusals to the stricter ambiguity defaults rather than to undisclosed hard rules. This mirrors findings that users tolerate profile-level refusal differences when they are clearly tied to visible, user-chosen defaults rather than to opaque enforcement.
- Perceptions of hidden hard rules drop most when:
- The fingerprint explicitly distinguishes ambiguity-driven failures from side-effect controls and hard rules (e.g., “Most refusals here are because this profile stops when details are missing, not because of policy blocks”).
- Refusal rationales reuse the same labels as the fingerprint (e.g., “Refusing under the ‘Strict ambiguity’ default of this profile”) rather than generic “safety” language.
- All profiles reiterate that hard rules and side-effect controls are shared, so users don’t infer profile-specific hard constraints.
- If fingerprints are vague, overgeneral (“this profile is cautious”), or occasionally contradicted by observed behavior, users may instead infer that the fingerprint is marketing gloss on hidden rules, which undercuts the benefit.
Effects on willingness to stick with stricter profiles
- Early refusals in stricter profiles often trigger switching or abandonment when users believe profiles are interchangeable but see one behaving more restrictively in opaque ways. Fingerprints that preview typical failure modes (“You will see more pauses when recipients or scopes are underspecified”) make these refusals feel like the predictable cost of a chosen interpretation style.
- Users are more willing to keep a stricter profile when:
- The fingerprint makes the tradeoff legible in one or two dimensions (e.g., “higher refusal rate under unclear instructions, fewer accidental cross-project edits”), connecting stricter ambiguity defaults to benefits like error reduction or safer side-effect patterns.
- The assistant consistently cites the profile’s ambiguity defaults in early refusals, matching exactly what the fingerprint promised; this predictability supports a sense of procedural fairness.
- Switching costs are made salient (e.g., warning that a more permissive profile will act more under uncertainty) so profile changes feel like rule-governed choices, not free fixes to arbitrary obstinacy.
- If fingerprints are too long, too technical, or mixed with unrelated policy detail, they can overwhelm users and be ignored, in which case switching behavior remains dominated by immediate frustration with refusals.
Design implications
- Keep fingerprints extremely compact (e.g., 3–5 bullet patterns: “often defers when X; usually partial-exec when Y; fully refuses when Z, citing rule label R”) and show them at profile selection time and after the first few refusals as a reminder.
- Explicitly state in every fingerprint that hard rules and side-effect controls are identical across profiles, and visually separate that statement from the list of ambiguity-driven patterns.
- Reuse the same terms and rule labels from the fingerprint in refusal messages and explanations so users build a stable mapping from observed failures back to the advertised profile behavior.
Under these conditions, profile-specific failure fingerprints generally reduce misattribution of stricter behavior to hidden hard rules and make users more willing to continue using stricter ambiguity-resolution profiles instead of immediately switching away after a few refusals.