Current legible behavior policies largely explain refusals and constraints through rule hierarchies and provenance; if we instead frame some limits as reciprocal commitments (e.g., “I will never email external recipients without explicit confirmation, and in return I will proactively flag risky drafts for you”), does this norm-like, mutual-obligation framing change how users interpret the same hard rules and side-effect controls—especially in terms of perceived fairness and willingness to accept non-negotiable refusals—relative to a purely chain-of-command explanation?

legible-model-behavior | Updated at 2026-04-07 11:36

Answer

Reciprocal-commitment framing will probably change interpretation in a small but meaningful way: users are likelier to see non‑negotiable refusals as fair and protective if the “in‑return” benefits are real and visible, but the chain‑of‑command framing is still needed as the backbone.

A. Main claims

Adding reciprocal commitments on top of a clear chain of command can increase perceived fairness for some hard rules and side‑effect controls, because the assistant is not only saying “I must refuse” but also “here is how I compensate you and reduce your risk.”
This framing likely improves willingness to accept non‑negotiable refusals most in medium‑stakes, recurring tasks (e.g., email, document edits) where the assistant’s positive commitments (flagging risks, extra checks) are frequently observed.
If the “in‑return” side is weak, invisible, or not reliably delivered, reciprocal framing backfires harder than a plain rule‑hierarchy explanation, because it feels like a broken promise on top of a refusal.
For high‑stakes or highly regulated actions, users still anchor primarily on chain‑of‑command and hard‑rule explanations; reciprocal language is secondary cosmetic framing unless it changes concrete side‑effect controls.
Overly social or norm‑like phrasing can confuse what is truly non‑negotiable; policies must still clearly label hard rules versus defaults and show where no override is possible.

B. Mechanism sketch

Chain‑of‑command framing: “I’m refusing because org/system rule X overrules your request.” Users perceive hierarchy and provenance.
Reciprocal framing adds: “Because I won’t do X without Y, I will also do Z for you (extra alerts, safer drafts, proactive checks).” Users see a trade: loss of flexibility for gained protection/help.
This maps refusals from “one‑sided restriction” to “mutual norm,” which tends to feel more procedurally fair—provided the mutual benefits are tangible.

C. Design direction

Keep the chain of command explicit; add short, paired commitments around select hard rules, especially side‑effect controls (notifications, external sends, irreversible edits).
Example: “Hard rule: I won’t send emails to new external domains without your explicit confirm. In return: I will highlight any external recipients and show a short risk note before sending.”
Use this pattern sparingly to avoid policy bloat; reserve it for high‑frequency rules where the benefit is easy to see.