Among users who show early signs of rising workflow maturity (e.g., template reuse, multi-step chaining), which specific failures or breakdowns in their AI workflows most strongly predict whether they (a) deepen prompt skill and extend their workflows versus (b) revert to shallow, one-off prompting or abandon the tool—and how can products deliberately surface these “productive failures” without harming short-term task success?

anthropic-learning-curves | Updated at

Answer

a) Failures that predict deeper skill/maturity

  1. Near-miss quality failures
  • Pattern: Output is close but needs edits (tone off, partial coverage, wrong format).
  • Predictive sign: User edits prompts or templates rather than abandoning; they rerun with variations.
  • Why productive: Keeps value visible while creating a reason to tinker.
  1. Boundary/edge-case failures
  • Pattern: Workflow works on standard cases but breaks on unusual inputs.
  • Predictive sign: User adds steps, conditions, or variants instead of discarding the flow.
  • Why productive: Teaches limits and encourages generalization.
  1. Chaining breakdowns, not total collapse
  • Pattern: One step in a multi-step chain fails (e.g., bad summary) but others are fine.
  • Predictive sign: User drills into that step, reorders, or inserts a new step.
  • Why productive: Focused troubleshooting builds workflow mental models.
  1. Misaligned assumptions, correctable in-prompt
  • Pattern: Model assumes wrong audience, goal, or constraints.
  • Predictive sign: User starts encoding these explicitly in prompts/templates.
  • Why productive: Pushes users toward more structured prompting.

b) Failures that predict reversion/abandonment

  1. Opaque, all-or-nothing failures
  • Pattern: One-click flow silently fails, hangs, or returns unusable output with no insight.
  • Outcome: User stops trusting the workflow; returns to one-off prompts or quits.
  1. Repeated brittle failures on core tasks
  • Pattern: Same important task fails several times in a row despite user effort.
  • Outcome: User labels tool as unreliable for that job and stops investing.
  1. Misleading “looks good but wrong” failures
  • Pattern: Confident, polished but factually or logically wrong outputs detected late.
  • Outcome: Perceived risk rises; users narrow usage to trivial tasks.
  1. Failures coupled with high switching cost
  • Pattern: Fixing requires leaving the tool, rewriting from scratch, or re-onboarding.
  • Outcome: Users choose manual work over further experimentation.

c) Product patterns to surface “productive failures” safely

  1. Make near-miss gaps visible and editable
  • Show quick, localized controls (e.g., "fix tone", "cover missing points") on imperfect outputs.
  • Side-by-side diff on reruns so users see impact of prompt tweaks.
  1. Soft-landing step failures
  • When a chain step fails, run the rest on a best-effort basis and highlight the broken step.
  • Provide one-click suggestions: "tighten instructions", "add example", "split into two steps".
  1. Structured retries instead of blank slates
  • On low-quality results, offer 2–3 targeted retry options based on common fixes (more context, different style, add constraints) rather than a generic "try again".
  1. Safe sandboxes for high-risk tasks
  • For tasks where hidden errors are costly, route first runs to a review sandbox with checks (comparisons, validations) before real use.
  • Encourage prompt refinement in this zone so users learn without high stakes.
  1. Lightweight instrumentation of breakdowns
  • Detect patterns like: frequent edits after a specific step, repeated manual fixes in the same place, or high undo rates.
  • Trigger small tips or micro-tutorials tied to that step (not global training).
  1. Guardrails against demoralizing failures
  • Cap the number of consecutive failed runs before escalating support (guided mode, human help, or richer examples).
  • Keep an easy escape hatch back to the user’s last known-good simple workflow.

Net effect: Designers should aim for frequent, explainable, low-severity failures (near-miss, fixable in-UI) and avoid opaque, high-stakes, or high-friction failures. The former correlate with users extending chains and editing prompts; the latter correlate with reversion to shallow use or churn.