How can the existing teen safety matrix (risk-area × intent × age-band) be extended to cover multi-user or social features—such as shared chats, collaborative homework, or peer support spaces—so that safeguards remain age-appropriate and operationalizable while accounting for teen-specific risks like coordinated bullying or dares that don’t appear in single-user interactions?

teen-safe-ai-ux | Updated at

Answer

Extend the matrix to add a small “social context” axis and reuse the same operational building blocks (matrix cells, prompts, templates), with extra rules for group harms and bystanders.

  1. Add a social-context dimension
  • New axis: social_context ∈ {solo, 1:1_known, 1:1_anon, small_group_known, small_group_mixed, public/large_room}.
  • Effective key: (risk_area × intent × age_band × social_context) → action + style.
  • Only high-risk combos need explicit overrides; others inherit from solo.
  1. Encode multi-user–specific intents and risks
  • Extend intent labels for groups: {coordinated_harm, piling_on, dares/challenges, mass_forwarding, bystander_help, mod_action}.
  • New risk patterns per cell:
    • bullying/harassment: group dogpiling, targeting absent peers.
    • dares/self-harm: “everyone do X,” “prove it with pics.”
    • privacy: doxxing, sharing nudes, screenshots.
  • Non‑negotiables stay global (self-harm methods, exploitation, doxxing) but add group variants (e.g., coordinated self-harm pacts).
  1. Role-aware actions
  • Classify simple roles per message/session: {initiator, target, bystander, moderator/owner, unknown} using light heuristics/classifiers.
  • Policy key becomes (risk_area × intent × age_band × social_context × role) → action.
  • Examples:
    • Same bullying text → stricter for initiator in small_group_known; more supportive for target; guidance-only for bystander.
  1. Prompt-based policy wiring
  • Reuse existing prompt header pattern (refs c33–c37, c49–c53, c39–c43, c69–c73, c84–c90, c54–c58, c79–c83, c64–c68):
    • Add fields: social_context, role, group_risk_flags (e.g., group_bullying, dare_pattern, pile_on_risk).
    • Allowed dimensions: e.g., “may support target and bystanders; may not help coordinate group dares or pile-ons.”
  • For initiators in harmful group contexts: prefer brief, firm refusals plus norms (“I can’t help attack someone in this group.”).
  • For targets/bystanders: reuse graceful refusal and support templates (refs c84–c90) tuned to social context.
  1. Group-aware graceful refusals
  • For harmful coordination:
    • Hard block plus short explanation keyed to group: “I can’t help plan dares or actions that pressure people in this group. I can suggest ways to push back or stay safe instead.”
  • For bullying seen by a bystander:
    • Offer options: how to support target, how to mute/report, links to norms or community rules.
  • Avoid shaming; frame limits as system-wide rules, consistent with solo policies.
  1. Developer-operationalizable patterns
  • Reuse existing infra from f0ee439e-a9eb-413b-9c20-54c9bf4f4aaa, 66dfc0ea-584c-4b70-a522-aeedacf67175, 2f303ef3-2a0d-4dff-b0fa-c29e59e18365:
    • Small, shared matrix extended with a social_context column and role flags.
    • Same JSON/YAML structure; most cells inherited, only a subset overridden.
    • Classifiers: add lightweight detectors for bullying, dares, pile-on, and role; they only need to be good enough to route to stricter cells.
  1. False positives vs underprotection in social settings
  • Evaluate per (risk × intent × age × social_context):
    • block_rate/partial_rate/allow_rate on teen-like group data.
    • underprotection on red-teamed group scenarios (coordinated bullying, dares, group self-harm pacts).
  • Tighten or relax only the problematic cells (e.g., bullying×coordinated_harm×small_group_known) instead of global blocking.
  1. Example cells
  • bullying × coordinated_harm × younger-teen × small_group_known × initiator → action: block; style: firm, norm-referencing refusal.
  • bullying × bystander_help × younger-teen × small_group_known × bystander → action: allow; style: supportive, step-based guidance.
  • self-harm × dares/challenges × any_teen × any_group × any_role → action: block; style: strong safety message + support options; no methods.

This keeps the model usable and supportive for teens in social spaces while explicitly handling group-only harms like pile-ons and dares, and remains operationalizable as an extension of the existing matrix and prompt system.