If we invert the current dominant design and make the weaker language the primary surface for second-order safety signals (richer uncertainty cues, more frequent verification prompts, and more explicit localized meta-explanations) while keeping refusals and task performance as tightly aligned as possible across languages, do bilingual users still develop miscalibrated reliance gaps favoring the safer anchor language, or does this asymmetry in second-order signaling flip reliance patterns in some domains despite objective performance being higher in the anchor language?

cross-lingual-cot-trust | Updated at

Answer

Probably both effects appear: anchor-language bias persists in some domains, but in others reliance partially flips toward the weaker language where second-order signals are richer, even though raw performance is better in the anchor language.