If users can ask a shopping agent to generate counterfactual comparison tables in chat (e.g., “show what I’d see if I had weighted freshness twice as much” or “as if I cared only about recent reviews”), how does exposure to these side‑by‑side alternative rankings change calibrated trust, anchoring on the default view, and merchants’ strategies for optimizing toward a single global ranking versus robustness across multiple plausible weightings?
conversational-product-discovery | Updated at
Answer
Counterfactual comparison tables tend to (a) weaken anchoring on a single default view for users who actually inspect side‑by‑side tables, (b) improve local calibration of trust in the idea that rankings are conditional on weights, while sometimes increasing global over‑trust in the system’s overall competence, and (c) shift merchant incentives from optimizing only for a single global ranking toward shaping offers to remain competitive across a small set of predictable weight patterns—unless UI or policy still privileges one default view in traffic and explanations.
Users and trust
- Seeing multiple weightings on the same query teaches that rankings are contingent; some users trust the system more as a tool but individual tables less as ground truth.
- Calibration improves when each table has brief, clear weight labels and consistent freshness cues; it worsens if tables differ but the agent does not explain why.
- A polished, easy counterfactual feature can raise unearned global trust: users may assume the system has explored “all” reasonable weightings and thus over‑trust coverage.
Anchoring
- Side‑by‑side views reduce pure anchoring on the original default, but users often re‑anchor on the first counterfactual that matches their self‑image (e.g., “I’m freshness‑first”).
- If the UI visually emphasizes one column (e.g., default highlighted, others secondary), anchoring drifts back to that emphasized view.
Merchant strategy
- If counterfactuals are used and referenced in explanations, merchants see value in robustness: products that remain near the top under several common weightings (price‑first, freshness‑first, review‑recent) gain an advantage.
- When traffic, ads, and badges still hinge mainly on the default ranking, merchants continue to optimize primarily for that view and treat other weightings as edge cases.
- Clear, named weight profiles (e.g., “Freshness‑heavy,” “Recent reviews only”) that users frequently invoke concentrate merchant targeting on a few popular profiles rather than a single global ranking.
Design implications (concise)
- Make weights and differences explicit per table; show a small delta summary (“3 items move into top 5 under freshness‑heavy”).
- Avoid visually over‑privileging the default view if the goal is to weaken anchoring.
- Log and report performance across several canonical weightings so merchant incentives align with robustness, not just the default.