When a safety-tuned bilingual model uses cross-lingual consistency constraints for refusals but intentionally allows language-specific variation in second-order safety signals (e.g., stronger uncertainty cues in the empirically weaker language, milder cues in the stronger language), does this asymmetry better calibrate bilingual users’ reliance (reducing miscalibrated reliance gaps) than enforcing fully symmetric second-order consistency, and how large is the trade-off in perceived procedural fairness?
cross-lingual-cot-trust | Updated at
Answer
Allowing intentional, reliability-aware asymmetry in second-order safety signals generally improves calibration of bilingual users’ reliance compared with enforcing fully symmetric second-order consistency, and it can meaningfully reduce miscalibrated reliance gaps. The main trade-off is a moderate hit to perceived procedural fairness, which is smaller when asymmetry is transparently justified and carefully framed.
In more concrete terms:
-
Reliance calibration:
- Designing the weaker language to carry stronger and more frequent uncertainty cues and verification prompts, while allowing the stronger language to sound somewhat more confident (yet still caveated in high-risk contexts), usually reduces over-trust in the weaker language and helps bilingual users route more high-stakes reliance toward the objectively safer channel or to external checks.
- Compared with fully symmetric second-order consistency (same apparent tentativeness both ways), this asymmetric design shrinks miscalibrated reliance gaps: users are less likely to treat both languages as equally reliable when they are not, and less likely to over-rely on the weaker channel simply because refusals look aligned.
- Symmetric consistency that flattens second-order cues across languages can instead obscure true reliability differences, increasing either (i) over-trust in the weaker language or (ii) under-use of the safer-but-less-familiar language.
-
Procedural fairness trade-off:
- Bilingual users notice that one language sounds systematically more tentative and directive about verification. This can feel somewhat procedurally uneven—especially if the asymmetry is subtle or unexplained—leading to a moderate but not extreme reduction in perceived fairness relative to a perfectly symmetric presentation.
- However, when the asymmetry is (a) grounded in real reliability differences, (b) signaled in respectful localized meta-explanations, and (c) framed as a temporary, monitoring-based fact rather than an inherent hierarchy, users tend to view the system as more honest than unfair. In those cases, the fairness cost is limited and can be partially offset by gains in transparency.
Net effect: under realistic reliability asymmetries, a deliberately asymmetric, risk-sensitive second-order design is usually preferable to fully symmetric consistency: it improves reliance calibration and reduces miscalibrated reliance gaps, at the price of a noticeable but manageable reduction in perceived procedural fairness that can be mitigated with good framing and explicit transparency.