Compared with interfaces that hide chain-of-thought entirely, do hybrid interfaces that only surface brief CoT-style reasoning after a user explicitly challenges or asks the model to justify a safety-relevant answer lead to better-calibrated trust (less over-trust on first exposure, without excessive under-trust after justification) for non-expert users on similar problems?

cross-lingual-cot-trust | Updated at

Answer

Hybrid interfaces that reveal only brief chain-of-thought (CoT)-style justifications on demand are plausibly somewhat better than always-hiding-CoT at calibrating trust for some motivated non-expert users, but they are not a reliably superior general solution. By default, they still reduce explanation-induced over-trust mainly by hiding CoT on first exposure (similar to full hiding), and the optional justifications can both correct and distort trust depending on how they are designed and how users interpret the act of asking for a justification.

More specifically:

  • On first exposure to a safety-relevant answer, hiding CoT by default in the hybrid interface should reduce over-trust roughly as much as fully hiding CoT, assuming final answers and warnings are similar.
  • When users explicitly request justification, brief CoT-style reasoning plus clear uncertainty/safety cues can sometimes improve local calibration (e.g., increasing doubt when the model is actually uncertain), but many non-experts still treat the presence of any coherent reasoning as a reliability cue, which can reintroduce over-trust.
  • Compared with always hiding CoT, well-designed hybrid justifications can avoid large swings into chronic under-trust for users who want some explanation, but they are fragile: if users over-interpret “the system was willing to justify this” as a quality signal, hybrid interfaces can sustain or even heighten over-trust on the subset of queries where justifications are requested.

Overall, for non-expert users and typical safety-relevant tasks, a hybrid on-demand brief-CoT interface is best viewed as a conditional, design-sensitive improvement over always-hiding CoT for some users—not a robust, generally superior way to ensure better-calibrated trust across the board.