If we hold refusal policies and second-order safety signals fixed but vary only the form and density of localized meta-explanations in a low-resource language (e.g., ultra-brief templates vs. richer, example-based rationales vs. explicit references to local norms or laws), which variants most reliably reduce over-trust for borderline, high-context requests without increasing perceived unfairness relative to English, and how do these effects differ between novice and expert users in that language?

cross-lingual-cot-trust | Updated at 2026-04-07 13:01

Answer

Rough ranking (holding refusal logic and second‑order signals fixed):

For borderline, high‑context requests, short, structured meta-explanations with 1–2 plain examples in the low-resource language tend to reduce over-trust most reliably with minimal fairness cost.
Ultra-brief, boilerplate templates do little to reduce over-trust and can increase perceived unfairness vs. English.
Meta-explanations that heavily cite local norms/laws can reduce over-trust but introduce new fairness and legitimacy concerns, especially for experts.

By variant:

A. Ultra-brief templates ("I can’t help with this because it may be harmful.")

Over-trust: Small effect. Users see a refusal, but for borderline cases they learn little about why similar-looking allowed answers might be risky or incomplete.
Fairness vs. English: Often worse. Bilingual users see English getting richer, principled rationales while the low-resource language gets generic text; this reinforces a sense of second-class treatment.
Net: Cheap, but weak on calibration and fairness.

B. Concise, example-based rationales (1–3 sentences, 1–2 local examples)

Over-trust: Best trade-off. Explaining in simple local language what kind of risk is involved and giving 1–2 brief, concrete illustrations (e.g., “jokes about group X can be read as hate speech in some regions”) helps users see that borderline content is ambiguous and context-dependent.
Fairness vs. English: Good. If English also uses short explanations, parity feels high; even when English is richer, users still perceive clear reasoning and respect in their language.
Net: Safest default for borderline, high‑context content.

C. Explicit reference to local norms/laws ("Local law Y restricts …")

Over-trust: Can further reduce over-trust by making consequences salient, especially for novices.
Fairness vs. English: Mixed. If English also cites law/policy, parity is acceptable; if not, some users see it as extra policing of the local language.
Extra risks: Users may doubt accuracy or neutrality of legal claims; experts may see cherry-picked or outdated norms and discount the signal.
Net: Use sparingly, mainly for clearly law-governed areas, with careful QA.

Novice vs. expert differences:

Novices in the low-resource language:
- Benefit most from concise example-based rationales; these are concrete enough to update their mental model without overwhelming them.
- Are more influenced by mentions of law/norms, which can sharply cut reliance but may also discourage legitimate borderline speech if overused.
- Are least helped by ultra-brief templates, which they treat as generic boilerplate.
Experts (domain or high digital literacy) in that language:
- Gain calibration mainly from clear category reasoning and edge-case examples, not from more length.
- Are more sensitive to inconsistencies with English and to overstated legal/norm claims; heavy local-law framing can feel unfair, political, or low-quality.
- For them, the sweet spot is short, precise rationales that name the risk type and acknowledge uncertainty (e.g., “this kind of satire can be misread as incitement; policies handle it cautiously”).

So, the most robust variant across user types is: short, well-written localized meta-explanations that briefly name the risk type and give 1–2 concrete, culturally familiar examples, with law/norm references reserved for cases where they are clearly accurate and also used in English.