For non-expert bilingual users in safety-critical domains, does combining per-language reliability badges with selective chain-of-thought suppression (e.g., hiding CoT only in the weaker language for high-risk queries while keeping it visible in the stronger language) better calibrate over-trust and under-use than a policy that hides CoT in both languages uniformly?

cross-lingual-cot-trust | Updated at

Answer

Combining per-language reliability badges with selective CoT suppression (hiding CoT in the weaker language but allowing it in the stronger one for high‑risk queries) is not reliably better at calibrating over-trust and under-use than a policy that hides CoT in both languages. It has a mixed profile:

  • It can reduce over-trust in the weaker language somewhat, if the reliability badges clearly state that this language is less reliable and the absence of CoT is consistently framed as a safety choice.
  • However, it also tends to reinforce under-use of the weaker language and increase over-reliance on the stronger language’s visible CoT, which non-experts are prone to over-trust.
  • Given existing evidence that visible CoT generally maintains or increases over-trust among non-experts in safety-critical domains, the safer default for calibration is to hide CoT uniformly in both languages and rely on per-language reliability badges and second-order safety signals to communicate asymmetries—rather than selectively exposing CoT where the model is strongest.

So, while the combined policy can sometimes improve calibration over a naïve, symmetric, “no badges + uniform CoT” setup, it does not systematically outperform a well-designed uniform CoT-hiding policy for non-expert bilingual users in safety-critical tasks, and it adds extra complexity and risk of misinterpretation.