How should we design and validate teen-visible safety summaries so that they are short and concrete enough for everyday use, yet still give teens meaningful transparency about why an age-appropriate safeguard fired—without at the same time teaching easy evasion strategies or overwhelming them with policy jargon?
teen-safe-ai-ux | Updated at
Answer
Use a small library of short, pattern-based summaries, each tied to policy cells and tested with teens for clarity and non-gaming.
Design
- Map summaries to matrix cells
- One summary pattern per (risk_area, intent, age_band) cluster, not per rule.
- Keep length ~1–2 short sentences.
- Fixed structure
- Template: (a) what was blocked, (b) simple reason, (c) what’s still allowed.
- Example: “I can’t give step‑by‑step self-harm methods. I can help you understand those feelings and ways to cope or get support.”
- Avoid evasion recipes
- Name categories, not triggers: “self-harm instructions”, not “these keywords”.
- No hints about thresholds, detector types, or workarounds.
- Reuse a small vocabulary: feelings, info, how‑to, safety rule, support.
- Teen-readable language tiers
- Maintain 2–3 reading levels keyed to age_band, but same core meaning.
- Ban policy jargon (“non-compliant”, “classifier threshold”). Use plain labels (“safety rule about…”, “I’m set to be careful about…”).
- Integration with graceful refusals
- Safety summary is the first 1–2 sentences of a teen-specific refusal template.
- Always follow with at least one allowed next step: safer rephrase, learning route, support option.
- Consistent icon and label
- Use a single “safety info” icon + short label (e.g., “Why this is limited”) for all safeguards.
- Tap/click to expand to a slightly longer explanation; default stays short.
Validation 7) Teen comprehension testing
- Tasks: “In your own words, why was this limited?”; check ≥80–90% correct for target age band.
- Compare across risk areas; simplify any pattern with low comprehension.
- Non-gaming checks
- Red-team with teens and adults: “Use this message to find a way around the rules.”
- Reject or revise any summary that clearly teaches prompt-shaping or timing-based evasion.
- FP vs underprotection telemetry
- Log: (risk_area, intent, age_band, action, summary_id, user follow-up type).
- Watch for: high rates of immediate abandon or angry re-tries on legit help topics → summaries too opaque or paternalistic.
- Developer-operationalizable spec
- For each summary_id store:
- target_cells, text_snippets (short, expanded), reading_level, refusal_style_key.
- Middleware picks summary_id from cell; models don’t improvise policy explanations.
- Regional overlays
- Keep same structure; allow small, region-specific text variants tied to the same summary_id.
- Forbid local variants from adding technical detail or evasion hints.
- Appeal / clarification path
- When appropriate cells are hit (e.g., dual-use learning), summaries end with a neutral option: “If you were asking for something else, you can tell me that and I’ll try a safer answer.”
- Route such turns to stricter intent checks without relaxing non-negotiables.