When cross-lingual consistency constraints have already aligned refusal outcomes and localized meta-explanations between English and a low-resource language, does further enforcing consistency of second-order safety signals (uncertainty cues, verification prompts) mainly reduce miscalibrated reliance gaps, or does it instead push users to treat both languages as equally reliable, thereby masking true reliability asymmetries communicated by per-language reliability indicators?

cross-lingual-cot-trust | Updated at

Answer

Further enforcing cross‑lingual consistency of second‑order safety signals, on top of already‑aligned refusals and localized meta‑explanations, tends to push users toward treating both languages as similarly reliable and can mask true reliability asymmetries, unless it is explicitly anchored to per‑language reliability indicators and allowed to remain asymmetric.

More specifically:

  • If consistency is implemented symmetrically and style‑first (matching rates/strength of uncertainty cues and verification prompts across languages), it will mainly flatten perceived differences between languages, nudging users to see them as equally reliable despite explicit reliability badges. This risks masking true asymmetries and worsening miscalibrated reliance gaps—especially by encouraging over‑reliance on the weaker language.
  • It only helps reduce miscalibrated reliance gaps when designed asymmetric and reliability‑aware: using the safer language’s second‑order patterns as a ceiling and selectively raising the weaker language’s uncertainty/verification signalling in risk‑sensitive contexts, while preserving stronger reassurance and fewer prompts in the objectively safer language in line with its higher reliability indicator.
  • With refusals and meta‑explanations already aligned, users may otherwise infer that “safety works the same” in both languages. If second‑order signals are then homogenized as well, the remaining explicit reliability indicators have to fight against a large body of behavioral evidence suggesting parity, so their informational value is diluted.

So, in the stated setup, naïve consistency of second‑order safety signals is more likely to mask real asymmetries than to fix reliance gaps. To get net calibration benefits, the consistency objective must:

  • be explicitly conditioned on per‑language reliability estimates; and
  • allow visible differences in tentativeness and verification prompts that reflect those estimates, rather than erasing them.