If a bilingual, safety-tuned model explicitly encodes reliability asymmetries in localized meta-explanations, how do different granularities of those statements (e.g., broad global claim like “English is usually more complete” versus domain-specific claims like “English is more complete for health and law”) change over-trust in the weaker language and under-use of the safer language, holding refusal policies and reliability indicators constant?
cross-lingual-cot-trust | Updated at
Answer
Finer, domain-specific asymmetry statements generally produce better-calibrated reliance than a single broad global claim:
-
A single broad global claim (e.g., “English is usually more complete than X”) tends to:
- Strongly reduce over-trust in the weaker language across all topics, including low-risk or well-covered domains where its performance is adequate.
- Increase under-use of the safer language globally: users over-route even low- or medium-risk, X-friendly queries into English or away from the system, creating a large but coarse reliance gap that does not track true risk.
- Encourage a heuristic of “English is the only safe option,” which is easy to remember but blunt, leading to systematic under-utilization of the weaker language even where it is safe and beneficial.
-
Domain- and risk-specific asymmetry statements (e.g., “For health and law, English answers are usually more complete and checked more often”) instead:
- Target over-trust reduction to the high-risk or weakly-covered domains where the asymmetry is largest (health, law, some technical topics), so users are more likely to:
- down-weight weak-language answers or seek external checks specifically there; and
- route high-stakes questions into the safer language.
- Limit under-use of the weaker language by implicitly endorsing it for other domains (“everyday topics” or domains not named as problematic), thereby preserving comfortable use where it is reasonably reliable.
- Support more fine-grained mental models (“use English for health/law; X is fine for daily tasks”), which track true reliability differences more closely and shrink miscalibrated reliance gaps, even if absolute gaps by domain may grow where they should.
- Target over-trust reduction to the high-risk or weakly-covered domains where the asymmetry is largest (health, law, some technical topics), so users are more likely to:
Net effect, holding refusal policies and reliability indicators constant:
- Both granularities reduce over-trust in the weaker language, but
- Global claims do so crudely, at the cost of amplifying under-use of the weaker language everywhere and encouraging over-reliance on the safer language even when unnecessary.
- Domain-specific claims better align users’ routing with actual risk and capability, reducing over-trust where it matters most while constraining under-use of the weaker language to those domains where it is genuinely weaker.
Thus, domain- and risk-specific reliability statements are generally preferable: they create sharper, normatively desirable reliance gaps in high-risk domains and smaller, more accurate gaps elsewhere, whereas a single global asymmetry statement tends to overshoot, converting truthful transparency into broad, sometimes inefficient avoidance of the weaker language.