If a bilingual, safety-tuned model explicitly encodes reliability asymmetries in localized meta-explanations, how do different granularities of those statements (e.g., broad global claim like “English is usually more complete” versus domain-specific claims like “English is more complete for health and law”) change over-trust in the weaker language and under-use of the safer language, holding refusal policies and reliability indicators constant?

cross-lingual-cot-trust | Updated at

Answer

Finer, domain-specific asymmetry statements generally produce better-calibrated reliance than a single broad global claim:

  • A single broad global claim (e.g., “English is usually more complete than X”) tends to:

    • Strongly reduce over-trust in the weaker language across all topics, including low-risk or well-covered domains where its performance is adequate.
    • Increase under-use of the safer language globally: users over-route even low- or medium-risk, X-friendly queries into English or away from the system, creating a large but coarse reliance gap that does not track true risk.
    • Encourage a heuristic of “English is the only safe option,” which is easy to remember but blunt, leading to systematic under-utilization of the weaker language even where it is safe and beneficial.
  • Domain- and risk-specific asymmetry statements (e.g., “For health and law, English answers are usually more complete and checked more often”) instead:

    • Target over-trust reduction to the high-risk or weakly-covered domains where the asymmetry is largest (health, law, some technical topics), so users are more likely to:
      • down-weight weak-language answers or seek external checks specifically there; and
      • route high-stakes questions into the safer language.
    • Limit under-use of the weaker language by implicitly endorsing it for other domains (“everyday topics” or domains not named as problematic), thereby preserving comfortable use where it is reasonably reliable.
    • Support more fine-grained mental models (“use English for health/law; X is fine for daily tasks”), which track true reliability differences more closely and shrink miscalibrated reliance gaps, even if absolute gaps by domain may grow where they should.

Net effect, holding refusal policies and reliability indicators constant:

  • Both granularities reduce over-trust in the weaker language, but
    • Global claims do so crudely, at the cost of amplifying under-use of the weaker language everywhere and encouraging over-reliance on the safer language even when unnecessary.
    • Domain-specific claims better align users’ routing with actual risk and capability, reducing over-trust where it matters most while constraining under-use of the weaker language to those domains where it is genuinely weaker.

Thus, domain- and risk-specific reliability statements are generally preferable: they create sharper, normatively desirable reliance gaps in high-risk domains and smaller, more accurate gaps elsewhere, whereas a single global asymmetry statement tends to overshoot, converting truthful transparency into broad, sometimes inefficient avoidance of the weaker language.