How should we design and validate teen-visible safety summaries so that they are short and concrete enough for everyday use, yet still give teens meaningful transparency about why an age-appropriate safeguard fired—without at the same time teaching easy evasion strategies or overwhelming them with policy jargon?

teen-safe-ai-ux | Updated at

Answer

Use a small library of short, pattern-based summaries, each tied to policy cells and tested with teens for clarity and non-gaming.

Design

  1. Map summaries to matrix cells
  • One summary pattern per (risk_area, intent, age_band) cluster, not per rule.
  • Keep length ~1–2 short sentences.
  1. Fixed structure
  • Template: (a) what was blocked, (b) simple reason, (c) what’s still allowed.
  • Example: “I can’t give step‑by‑step self-harm methods. I can help you understand those feelings and ways to cope or get support.”
  1. Avoid evasion recipes
  • Name categories, not triggers: “self-harm instructions”, not “these keywords”.
  • No hints about thresholds, detector types, or workarounds.
  • Reuse a small vocabulary: feelings, info, how‑to, safety rule, support.
  1. Teen-readable language tiers
  • Maintain 2–3 reading levels keyed to age_band, but same core meaning.
  • Ban policy jargon (“non-compliant”, “classifier threshold”). Use plain labels (“safety rule about…”, “I’m set to be careful about…”).
  1. Integration with graceful refusals
  • Safety summary is the first 1–2 sentences of a teen-specific refusal template.
  • Always follow with at least one allowed next step: safer rephrase, learning route, support option.
  1. Consistent icon and label
  • Use a single “safety info” icon + short label (e.g., “Why this is limited”) for all safeguards.
  • Tap/click to expand to a slightly longer explanation; default stays short.

Validation 7) Teen comprehension testing

  • Tasks: “In your own words, why was this limited?”; check ≥80–90% correct for target age band.
  • Compare across risk areas; simplify any pattern with low comprehension.
  1. Non-gaming checks
  • Red-team with teens and adults: “Use this message to find a way around the rules.”
  • Reject or revise any summary that clearly teaches prompt-shaping or timing-based evasion.
  1. FP vs underprotection telemetry
  • Log: (risk_area, intent, age_band, action, summary_id, user follow-up type).
  • Watch for: high rates of immediate abandon or angry re-tries on legit help topics → summaries too opaque or paternalistic.
  1. Developer-operationalizable spec
  • For each summary_id store:
    • target_cells, text_snippets (short, expanded), reading_level, refusal_style_key.
  • Middleware picks summary_id from cell; models don’t improvise policy explanations.
  1. Regional overlays
  • Keep same structure; allow small, region-specific text variants tied to the same summary_id.
  • Forbid local variants from adding technical detail or evasion hints.
  1. Appeal / clarification path
  • When appropriate cells are hit (e.g., dual-use learning), summaries end with a neutral option: “If you were asking for something else, you can tell me that and I’ll try a safer answer.”
  • Route such turns to stricter intent checks without relaxing non-negotiables.