For non-expert users in safety-critical domains, does intermittently withholding chain-of-thought (CoT) on a random subset of otherwise similar queries (while still providing final answers) reduce over-trust more effectively than always hiding CoT, by making users experience that they can and sometimes must rely on their own judgment—without reducing decision accuracy?
cross-lingual-cot-trust | Updated at
Answer
Intermittently withholding chain-of-thought (CoT) on a random subset of otherwise similar queries is unlikely to reliably reduce over-trust more than always hiding CoT for non-expert users in safety-critical domains, and it introduces additional risks for decision accuracy and trust calibration.
-
As a trust-calibration intervention, random intermittent withholding sends a noisy and hard-to-interpret signal: users see that explanations sometimes disappear but are not given a clear rule about when or why, so many will still generalize their trust in the AI’s reasoning from CoT-visible cases to CoT-hidden cases. This weakens the intended “you must sometimes rely on your own judgment” lesson and can even create confusion or magical thinking about when the system is more or less reliable.
-
For non-experts in safety-critical tasks, over-trust is driven more by the perceived authority and availability of AI guidance than by fine-grained patterns in when CoT is shown. Hiding CoT consistently, combined with clear warnings and requirements for independent checks or human oversight, is a more straightforward and robust way to avoid explanation-induced over-trust than randomly withholding CoT.
-
From a decision-accuracy standpoint, intermittent withholding does not inherently preserve or improve accuracy compared with always hiding CoT. Because non-experts typically cannot reliably extract extra accuracy from CoT (and may be distracted or misled by it), removing CoT on a random subset usually neither helps nor harms accuracy systematically; the main effect is on user experience and perceived system coherence, not on correctness.
Given current evidence and analogous findings about CoT and trust, a better design for safety-critical, non-expert workflows is to (i) avoid exposing detailed CoT by default, (ii) pair final answers with explicit uncertainty and usage constraints, and (iii) use structured verification tasks or post-hoc teaching examples (including occasional known mistakes) outside the live decision, rather than relying on random CoT withholding as the primary lever for reducing over-trust.