For non-expert users, does a brief, standardized pre-interaction onboarding that explicitly demonstrates model fallibility using 2–3 realistic examples (with short, uncertainty-rich summaries instead of full chain-of-thought) lead to more durable reductions in over-trust and equal or better task accuracy than relying solely on in-line interface cues such as “potentially misleading reasoning” labels or bare answers without any onboarding, on the same classes of problems?

cross-lingual-cot-trust | Updated at 2026-04-06 16:51

Answer

A brief, standardized pre-interaction onboarding that uses 2–3 realistic fallibility demonstrations with short, uncertainty-rich summaries is plausibly somewhat better but not reliably superior to relying only on in-line cues (like “potentially misleading reasoning” labels) or bare answers without onboarding for reducing over-trust, and it is unlikely to systematically improve task accuracy (though it can usually match it if well designed). Its benefits are fragile: they depend strongly on user attention, context, and how closely later tasks resemble the onboarding examples.

Over-trust (durability):
- Compared to no onboarding with bare answers, a one-time onboarding can create a salient early impression that the model is fallible and sometimes confidently wrong. For users who actually attend to it, this can produce moderate, longer-lasting reductions in over-trust than bare answers alone, at least over the next several similar tasks.
- Compared to relying only on in-line cues such as “potentially misleading reasoning” labels shown during interaction, onboarding shifts some of the cognitive load to a dedicated, low-stakes phase. This can help some users form a more coherent mental model (“this system sometimes makes realistic errors; I should stay skeptical”), which may be more enduring than piecemeal exposure to in-line warnings.
- However, as with uncertainty-rich summaries and other light-touch explanations, many non-experts treat polished summaries and UI polish as credibility signals. If the onboarding is too quick, too generic, or perceived as boilerplate, its effect often decays rapidly, and over-trust reverts toward whatever is driven by the ongoing interaction (answer style, frequency of obvious errors, etc.). Thus, onboarding is not a robust, standalone fix and will not reliably outperform well-designed in-line cues for all users.
Task accuracy:
- Onboarding that uses short, uncertainty-rich summaries (rather than full chain-of-thought) and clearly separates didactic examples from real tasks is unlikely to harm accuracy and can usually match the performance of bare-answer or in-line–cue baselines on similar problems.
- For typical non-expert users, it does not consistently increase accuracy, because understanding that the model is fallible does not automatically translate into better checking strategies or domain knowledge. Some users may avoid the most egregious misuses (e.g., treating a single answer as definitive in obviously high-stakes situations), but large, systematic accuracy gains are unlikely.
Comparative summary:
- Versus bare answers with no onboarding: onboarding is plausibly better at reducing over-trust for users who pay attention, with roughly similar task accuracy, but the effect is variable and context-dependent.
- Versus in-line cues alone (e.g., “potentially misleading reasoning” labels): a brief onboarding can complement or mildly outperform in-line cues in shaping initial trust calibration, but it is not reliably stronger across all users or time horizons; combining both is more promising than using either in isolation.

Overall, a short pre-interaction onboarding with realistic fallibility examples and uncertainty-rich summaries should be seen as a useful, low-risk supplement that can sometimes yield more durable reductions in over-trust than bare answers or labels alone, but it is not a universally superior replacement and should be paired with ongoing uncertainty cues, guardrails, and external-check scaffolding in higher-stakes settings.