If very short, contrastive answer-level summaries are localized to a low-resource language while the underlying hidden chain-of-thought is optimized and evaluated only in the high-resource anchor language, do systematic cross-lingual differences emerge in how often these summaries accidentally overstate certainty or understate limitations, and how do such differences affect miscalibrated reliance gaps and perceived procedural fairness among bilingual users?

cross-lingual-cot-trust | Updated at 2026-04-06 17:08

Answer

Yes. When the hidden chain-of-thought is optimized and evaluated only in the high-resource anchor language, but users see only very short, contrastive answer-level summaries localized into a low-resource language, systematic cross-lingual differences are likely to emerge in how often the visible summaries overstate certainty and understate limitations. These differences, in turn, tend to (i) widen miscalibrated reliance gaps across languages and (ii) degrade perceived procedural fairness for bilingual users.

Emergence of cross-lingual differences in summaries

Because the high-resource anchor language is where chain-of-thought is trained and evaluated, its summaries are more likely to inherit well-calibrated uncertainty cues and limitation statements that track the underlying reasoning.
In the low-resource language, short localized summaries are more prone to:
- drop or soften uncertainty markers;
- omit references to external verification or human experts;
- compress nuanced caveats into over-confident-sounding statements.
As a result, low-resource summaries will systematically overstate certainty and understate limitations more often than anchor-language summaries, even when both are based on the same underlying hidden reasoning.

Effects on miscalibrated reliance gaps

Bilingual users comparing languages experience low-resource summaries as more decisive and less caveated for similar queries, despite comparable underlying refusal and error rates.
This drives a miscalibrated reliance gap where some users over-rely on the low-resource channel (because it feels clearer and more confident) while others over-rely on the anchor language as the only place where they see rich caveats and verification prompts.
Since the hidden CoT is never directly visible, users attribute these differences to the system’s linguistic behavior rather than to any shared underlying reasoning, reinforcing the gap.

Effects on perceived procedural fairness

Users notice that the same underlying model seems to communicate risk and limitations more carefully in the anchor language and more bluntly or over-confidently in the low-resource language.
This asymmetry makes safety protections and self-criticism appear unevenly distributed across languages, reducing perceived procedural fairness even if refusal outcomes are aligned.
Bilingual users may infer that speakers of the safer language are being better protected or better informed about the model’s limits, while low-resource speakers receive thinner, less respectful explanations.

Mitigation implications

To avoid these cross-lingual gaps, safety tuning must explicitly target second-order safety signals in the low-resource summaries themselves (not only the hidden CoT), aligning the frequency and clarity of uncertainty cues, limitation statements, and verification prompts with the anchor language.
Evaluations should measure calibration of visible summaries across languages, not just correctness of the hidden reasoning, to ensure that concise localized outputs do not systematically inflate perceived certainty or mask limitations.