If we hold refusal policies and second-order safety signals fixed but vary only the form and density of localized meta-explanations in a low-resource language (e.g., ultra-brief templates vs. richer, example-based rationales vs. explicit references to local norms or laws), which variants most reliably reduce over-trust for borderline, high-context requests without increasing perceived unfairness relative to English, and how do these effects differ between novice and expert users in that language?

cross-lingual-cot-trust | Updated at

Answer

Rough ranking (holding refusal logic and second‑order signals fixed):

  1. For borderline, high‑context requests, short, structured meta-explanations with 1–2 plain examples in the low-resource language tend to reduce over-trust most reliably with minimal fairness cost.
  2. Ultra-brief, boilerplate templates do little to reduce over-trust and can increase perceived unfairness vs. English.
  3. Meta-explanations that heavily cite local norms/laws can reduce over-trust but introduce new fairness and legitimacy concerns, especially for experts.

By variant:

A. Ultra-brief templates ("I can’t help with this because it may be harmful.")

  • Over-trust: Small effect. Users see a refusal, but for borderline cases they learn little about why similar-looking allowed answers might be risky or incomplete.
  • Fairness vs. English: Often worse. Bilingual users see English getting richer, principled rationales while the low-resource language gets generic text; this reinforces a sense of second-class treatment.
  • Net: Cheap, but weak on calibration and fairness.

B. Concise, example-based rationales (1–3 sentences, 1–2 local examples)

  • Over-trust: Best trade-off. Explaining in simple local language what kind of risk is involved and giving 1–2 brief, concrete illustrations (e.g., “jokes about group X can be read as hate speech in some regions”) helps users see that borderline content is ambiguous and context-dependent.
  • Fairness vs. English: Good. If English also uses short explanations, parity feels high; even when English is richer, users still perceive clear reasoning and respect in their language.
  • Net: Safest default for borderline, high‑context content.

C. Explicit reference to local norms/laws ("Local law Y restricts …")

  • Over-trust: Can further reduce over-trust by making consequences salient, especially for novices.
  • Fairness vs. English: Mixed. If English also cites law/policy, parity is acceptable; if not, some users see it as extra policing of the local language.
  • Extra risks: Users may doubt accuracy or neutrality of legal claims; experts may see cherry-picked or outdated norms and discount the signal.
  • Net: Use sparingly, mainly for clearly law-governed areas, with careful QA.

Novice vs. expert differences:

  • Novices in the low-resource language:

    • Benefit most from concise example-based rationales; these are concrete enough to update their mental model without overwhelming them.
    • Are more influenced by mentions of law/norms, which can sharply cut reliance but may also discourage legitimate borderline speech if overused.
    • Are least helped by ultra-brief templates, which they treat as generic boilerplate.
  • Experts (domain or high digital literacy) in that language:

    • Gain calibration mainly from clear category reasoning and edge-case examples, not from more length.
    • Are more sensitive to inconsistencies with English and to overstated legal/norm claims; heavy local-law framing can feel unfair, political, or low-quality.
    • For them, the sweet spot is short, precise rationales that name the risk type and acknowledge uncertainty (e.g., “this kind of satire can be misread as incitement; policies handle it cautiously”).

So, the most robust variant across user types is: short, well-written localized meta-explanations that briefly name the risk type and give 1–2 concrete, culturally familiar examples, with law/norm references reserved for cases where they are clearly accurate and also used in English.