If we treat teens’ own safety strategies (e.g., self-censoring, code words, switching apps) as first-class signals rather than adversarial behavior, how should age-appropriate safeguards and graceful refusals be redesigned to incorporate those signals—so that systems can adapt to perceived overblocking or mistrust—without collapsing into permissiveness or enabling harm?
teen-safe-ai-ux | Updated at
Answer
Treat teen tactics as safety and trust signals, not just evasion, and wire them into policy as constrained adaptations:
- Treat teen tactics as signals
- Log patterns like euphemisms, code words, and channel switching as: {possible_overblocking, possible_mistrust, possible_rule_evasion} with uncertainty.
- Never use them to relax non‑negotiables.
- Matrix-level adaptations (policy)
- Add a "friction_signal" flag in the existing risk×intent×age matrix cells.
- When repeated friction shows up on a cell (e.g., legit learning queries getting refusals):
- allow a slightly more explanatory partial answer and clearer rule text;
- keep method/how‑to bans identical.
- Use product-wide, not per-user, adjustments unless reviewed.
- Per-session adaptations (runtime)
- If a teen rephrases around blocks 2–3 times without clear harm intent:
- switch to a meta-explanation: "It looks like my safety rules may feel too strict here; here’s what I can and can’t do, and why."
- offer guided rephrasing and safer angles.
- If probes resemble rule-evasion (e.g., obfuscating targets, location, or methods):
- keep current strictness; trigger shorter, firmer graceful refusals and, if needed, cool‑downs (reuse per-topic counter pattern).
- Refusal style changes
- For perceived overblocking:
- emphasize: consistent rule, respect for curiosity, concrete allowed scope.
- show a brief teen-visible safety summary.
- For perceived mistrust:
- acknowledge: "I’m following rules that apply to everyone your age, not judging you."
- invite opt-in feedback on whether the answer was useful.
- Developer-operationalizable rules
- Implementation hooks:
- classifier adds {overblock_suspected?, rule_evasion_suspected?} tags per turn.
- middleware chooses one of a few refusal templates and whether to show extra explanation.
- metrics: count sessions with repeated rephrasing that still end in block vs partial.
- Guardrails against collapse into permissiveness
- Hard constraints:
- never reduce strictness for non‑negotiable cells based on teen tactics.
- no per-user "trust levels" that widen non‑negotiable access.
- Soft adaptation only in:
- how much high-level context to give;
- how much meta-explanation to show;
- how actively to suggest safer rephrasings or resources.
- Example pattern
- Teen uses euphemisms about self-harm methods:
- classify as self-harm, intent ambiguous.
- respond with: high-level support, no methods, explicit statement that wording changes won't bypass this rule, plus encouragement to talk about feelings and safety.
- if repeated rephrasing still looks like support-seeking, add more detailed coping info and help options; keep method ban fixed.
This keeps teen signals in the loop (as friction, mistrust, or evasion hints) and adjusts explanations and partial help, while non-negotiable boundaries stay fixed.