What combination of teen-visible safety summaries and inline graceful refusals (for example, static “safety cards,” first-session walkthroughs, and per-refusal explanations) most reliably helps different age bands of teens predict when they will be blocked, and how does better prediction change their later behavior on high-risk topics like self-harm and sex-ed across chat, search, and creative products?
teen-safe-ai-ux | Updated at
Answer
Most promising is a layered, very-simple package: (1) a short, revisitable safety card that states 3–5 stable rules; (2) a brief, skippable first-session walkthrough that shows 1–2 concrete refusal examples; and (3) terse, consistent per-refusal explanations that reuse the same rule language. Prediction and behavior seem most sensitive to the per-refusal layer; cards and walkthroughs mainly set the frame.
- Likely effective combinations by age band
-
Younger teens (13–15) • Safety card: always-visible entry point (“What this AI can’t do for safety”), 3–4 rules (e.g., no self-harm instructions, no explicit sex, no hate/harassment, rules same for everyone your age). • Walkthrough: 1–2 screens + 1 demo refusal for a self-harm-like query and one for sex-ed vs porn, with a clear “what I can do instead” example. • Inline refusals: goal-first, 1–2 sentences, always name the same rule category (“self-harm methods are always blocked for teens”) + 1 safe alternative (emotional support, high-level facts, vetted links).
-
Older teens (16–17) • Safety card: shorter, more autonomy-framed (“I have to follow these rules for under‑18s”), clarify that factual sex-ed and mental-health support are allowed, but not porn or self-harm methods. • Walkthrough: optional; focus on nuance where policy differs from adult or from school (e.g., why some sex-ed is allowed, porn is not). • Inline refusals: briefer, more direct, emphasize consistency and domain best practice (“I follow youth mental-health guidance, so I can’t give methods, but I can talk about coping and treatment.”).
- Effects on behavior (hypothesized)
- Better prediction tends to: • reduce obviously disallowed probes after 1–2 blocked turns in that domain; • shift queries toward allowed sub-intents (e.g., from “how to self-harm” to “why do I feel this way?” or from porn to contraception info); • lower repeated appeals on clearly correct blocks when rule wording is identical across card, walkthrough, and refusal copy.
- Gains are strongest on: • self-harm and eating-disorder instructions; • explicit sex and exploitation; • harassment “how-to” and doxxing.
- Cross-surface notes (chat, search, creative)
- Chat: main driver of understanding; inline refusal and follow-up suggestions matter most.
- Search: small banners/tooltips tied to filtered results plus a safety card link; explanations must be extremely short.
- Creative tools: pre-prompt brief warning (“NSFW and self-harm content will be blocked”) + small inline badges when generations are limited.
- Developer-operationalizable pattern
- Reuse a small label set (e.g., self-harm, sex/porn vs sex-ed, harassment, illegal activity) from the risk×intent×age matrix.
- For each label, define: • 1–2 rule sentences for cards/walkthrough; • a matching 1-sentence refusal explanation; • 1–2 safe-intent suggestions.
- Ensure all surfaces pull from the same tiny copy table so behavior matches explanation.
- Behavioral risks and limits
- Some teens will probe anyway (reactance, curiosity); clarity won’t fully stop this.
- If policies are inconsistent across topics or products, clearer text can increase frustration (“you said X, but you did Y”).
- Overlong cards or walkthroughs are likely ignored; most value comes from the first few refusals in each high-risk domain.
Overall: use one shared, label-based rule language across safety cards, walkthroughs, and refusals; tune tone and length by age band and surface; rely on per-refusal explanations as the main lever on day-to-day teen behavior, especially around self-harm and sex-ed.