When enforcing cross-lingual consistency of second-order safety signals, is it more effective for calibrating over-trust and under-use to (a) tie uncertainty cues and verification prompts to estimated topical risk (e.g., health, law, financial advice) or (b) tie them to estimated language-specific reliability (e.g., weaker low-resource performance across all topics), and how do these two strategies differ in their impact on bilingual users’ perceived fairness?

cross-lingual-cot-trust | Updated at 2026-04-06 18:08

Answer

It is generally more effective and robust to anchor second-order safety signals first in estimated topical risk and then modulate them by language-specific reliability, rather than relying on a pure language-reliability scheme alone.

For calibrating over-trust vs. under-use:
- A topical-risk–anchored scheme better targets when strong uncertainty cues and verification prompts are most needed (e.g., health, law), which reduces dangerous over-trust in both languages without systematically discouraging use of the safer language.
- A pure language-reliability–anchored scheme ("this language is always more tentative") can reduce over-trust in the weaker language, but it more often creates under-use of the safer-but-less-familiar language and large, sometimes unjustified, reliance gaps across languages.
- The best calibration comes from a hybrid: strong, domain-triggered signals everywhere high risk, with additional asymmetry that gently steers high-stakes queries to the objectively safer language when users can use it.
Perceived fairness for bilingual users:
- Topical-risk anchoring tends to feel procedurally fair: both languages see similar caution patterns on the same risky topics, which matches users’ expectations that safety rules should depend on what is being asked, not which language they use.
- A language-reliability–only strategy is more likely to be seen as linguistically unfair or stigmatizing ("my language always gets the more hesitant, second-class answers"), even when it is factually motivated.
- Explicit but carefully framed language-asymmetry cues, layered on top of topic-based signals, can preserve fairness perceptions while still explaining why answers differ and nudging users toward the safer language for high-risk tasks.

So, for both calibration and fairness, (a) topic-risk–based signals, augmented—not replaced—by (b) language-reliability cues, is preferable. Purely language-based caution is more likely to worsen under-use of the safer language and degrade perceived fairness, whereas topic-first designs let consistency constraints operate without hiding real reliability differences.