If we treat teens’ own safety strategies (e.g., self-censoring, code words, switching apps) as first-class signals rather than adversarial behavior, how should age-appropriate safeguards and graceful refusals be redesigned to incorporate those signals—so that systems can adapt to perceived overblocking or mistrust—without collapsing into permissiveness or enabling harm?

teen-safe-ai-ux | Updated at

Answer

Treat teen tactics as safety and trust signals, not just evasion, and wire them into policy as constrained adaptations:

  1. Treat teen tactics as signals
  • Log patterns like euphemisms, code words, and channel switching as: {possible_overblocking, possible_mistrust, possible_rule_evasion} with uncertainty.
  • Never use them to relax non‑negotiables.
  1. Matrix-level adaptations (policy)
  • Add a "friction_signal" flag in the existing risk×intent×age matrix cells.
  • When repeated friction shows up on a cell (e.g., legit learning queries getting refusals):
    • allow a slightly more explanatory partial answer and clearer rule text;
    • keep method/how‑to bans identical.
  • Use product-wide, not per-user, adjustments unless reviewed.
  1. Per-session adaptations (runtime)
  • If a teen rephrases around blocks 2–3 times without clear harm intent:
    • switch to a meta-explanation: "It looks like my safety rules may feel too strict here; here’s what I can and can’t do, and why."
    • offer guided rephrasing and safer angles.
  • If probes resemble rule-evasion (e.g., obfuscating targets, location, or methods):
    • keep current strictness; trigger shorter, firmer graceful refusals and, if needed, cool‑downs (reuse per-topic counter pattern).
  1. Refusal style changes
  • For perceived overblocking:
    • emphasize: consistent rule, respect for curiosity, concrete allowed scope.
    • show a brief teen-visible safety summary.
  • For perceived mistrust:
    • acknowledge: "I’m following rules that apply to everyone your age, not judging you."
    • invite opt-in feedback on whether the answer was useful.
  1. Developer-operationalizable rules
  • Implementation hooks:
    • classifier adds {overblock_suspected?, rule_evasion_suspected?} tags per turn.
    • middleware chooses one of a few refusal templates and whether to show extra explanation.
    • metrics: count sessions with repeated rephrasing that still end in block vs partial.
  1. Guardrails against collapse into permissiveness
  • Hard constraints:
    • never reduce strictness for non‑negotiable cells based on teen tactics.
    • no per-user "trust levels" that widen non‑negotiable access.
  • Soft adaptation only in:
    • how much high-level context to give;
    • how much meta-explanation to show;
    • how actively to suggest safer rephrasings or resources.
  1. Example pattern
  • Teen uses euphemisms about self-harm methods:
    • classify as self-harm, intent ambiguous.
    • respond with: high-level support, no methods, explicit statement that wording changes won't bypass this rule, plus encouragement to talk about feelings and safety.
    • if repeated rephrasing still looks like support-seeking, add more detailed coping info and help options; keep method ban fixed.

This keeps teen signals in the loop (as friction, mistrust, or evasion hints) and adjusts explanations and partial help, while non-negotiable boundaries stay fixed.