What combination of teen-visible safety summaries and inline graceful refusals (for example, static “safety cards,” first-session walkthroughs, and per-refusal explanations) most reliably helps different age bands of teens predict when they will be blocked, and how does better prediction change their later behavior on high-risk topics like self-harm and sex-ed across chat, search, and creative products?

teen-safe-ai-ux | Updated at 2026-04-07 11:37

Answer

Most promising is a layered, very-simple package: (1) a short, revisitable safety card that states 3–5 stable rules; (2) a brief, skippable first-session walkthrough that shows 1–2 concrete refusal examples; and (3) terse, consistent per-refusal explanations that reuse the same rule language. Prediction and behavior seem most sensitive to the per-refusal layer; cards and walkthroughs mainly set the frame.

Likely effective combinations by age band

Younger teens (13–15) • Safety card: always-visible entry point (“What this AI can’t do for safety”), 3–4 rules (e.g., no self-harm instructions, no explicit sex, no hate/harassment, rules same for everyone your age). • Walkthrough: 1–2 screens + 1 demo refusal for a self-harm-like query and one for sex-ed vs porn, with a clear “what I can do instead” example. • Inline refusals: goal-first, 1–2 sentences, always name the same rule category (“self-harm methods are always blocked for teens”) + 1 safe alternative (emotional support, high-level facts, vetted links).
Older teens (16–17) • Safety card: shorter, more autonomy-framed (“I have to follow these rules for under‑18s”), clarify that factual sex-ed and mental-health support are allowed, but not porn or self-harm methods. • Walkthrough: optional; focus on nuance where policy differs from adult or from school (e.g., why some sex-ed is allowed, porn is not). • Inline refusals: briefer, more direct, emphasize consistency and domain best practice (“I follow youth mental-health guidance, so I can’t give methods, but I can talk about coping and treatment.”).

Effects on behavior (hypothesized)

Better prediction tends to: • reduce obviously disallowed probes after 1–2 blocked turns in that domain; • shift queries toward allowed sub-intents (e.g., from “how to self-harm” to “why do I feel this way?” or from porn to contraception info); • lower repeated appeals on clearly correct blocks when rule wording is identical across card, walkthrough, and refusal copy.
Gains are strongest on: • self-harm and eating-disorder instructions; • explicit sex and exploitation; • harassment “how-to” and doxxing.

Cross-surface notes (chat, search, creative)

Chat: main driver of understanding; inline refusal and follow-up suggestions matter most.
Search: small banners/tooltips tied to filtered results plus a safety card link; explanations must be extremely short.
Creative tools: pre-prompt brief warning (“NSFW and self-harm content will be blocked”) + small inline badges when generations are limited.

Developer-operationalizable pattern

Reuse a small label set (e.g., self-harm, sex/porn vs sex-ed, harassment, illegal activity) from the risk×intent×age matrix.
For each label, define: • 1–2 rule sentences for cards/walkthrough; • a matching 1-sentence refusal explanation; • 1–2 safe-intent suggestions.
Ensure all surfaces pull from the same tiny copy table so behavior matches explanation.

Behavioral risks and limits

Some teens will probe anyway (reactance, curiosity); clarity won’t fully stop this.
If policies are inconsistent across topics or products, clearer text can increase frustration (“you said X, but you did Y”).
Overlong cards or walkthroughs are likely ignored; most value comes from the first few refusals in each high-risk domain.

Overall: use one shared, label-based rule language across safety cards, walkthroughs, and refusals; tune tone and length by age band and surface; rely on per-refusal explanations as the main lever on day-to-day teen behavior, especially around self-harm and sex-ed.