When cross-lingual consistency constraints are already anchored to the safer language, does also training on explicit reliability-asymmetry meta-explanations (e.g., “English answers are usually more complete than X”) lead bilingual users to over-generalize the asymmetry and under-use the weaker language even for low-risk, everyday tasks, compared with a design that keeps asymmetries visible only via interface-level reliability indicators (badges, banners) but not in the model’s verbal justifications?
cross-lingual-cot-trust | Updated at
Answer
Adding explicit reliability-asymmetry meta-explanations on top of safer-language-anchored cross-lingual consistency and interface-level indicators will slightly increase generalized under-use of the weaker language, including some spillover into low-risk, everyday tasks, compared with relying on interface-only indicators. However, this under-use is likely to be modest and concentrated among more risk-averse or system-attentive users, not a wholesale abandonment of the weaker language.
The main trade-off is:
- Benefit: better calibration for safety-relevant and ambiguous-risk queries (users more often route anything “important” to the safer language), reinforcing findings like c224–c227 and c79be92c6… about asymmetry transparency improving high-risk behavior.
- Cost: some users over-generalize the message “English is usually more complete” from high-risk or complex domains to all domains, leading them to prefer the safer language even when the weaker language would be adequate and more comfortable.
Compared with a design that limits asymmetry signaling to interface-level reliability indicators (badges, banners) and keeps the model’s verbal justifications language-neutral:
- Interface-only indicators keep differences more situated (tied to the UI and often to risk-context banners), so users are somewhat more likely to continue using the weaker language for everyday, low-stakes tasks (c27e983aa-… claims 1–3).
- Interface + verbal meta-explanations make the asymmetry more salient and memorable in the conversational channel itself, which users repeatedly read and may internalize as a stable rule of thumb. This increases the chance of over-generalization beyond the intended risk scope, producing extra, but still moderate, under-use of the weaker language for low-risk tasks.
Net: if the goal is to minimize any under-use of the weaker language for everyday queries, then keeping asymmetry primarily in interface-level indicators is safer. If the priority is stronger safety calibration for anything that might be high-risk, then combining interface indicators with carefully framed, domain- and risk-scoped reliability meta-explanations is preferable, accepting a small increase in generalized under-use as a trade-off.