How can the existing teen safety matrix (risk-area × intent × age-band) be extended to cover multi-user or social features—such as shared chats, collaborative homework, or peer support spaces—so that safeguards remain age-appropriate and operationalizable while accounting for teen-specific risks like coordinated bullying or dares that don’t appear in single-user interactions?
teen-safe-ai-ux | Updated at
Answer
Extend the matrix to add a small “social context” axis and reuse the same operational building blocks (matrix cells, prompts, templates), with extra rules for group harms and bystanders.
- Add a social-context dimension
- New axis: social_context ∈ {solo, 1:1_known, 1:1_anon, small_group_known, small_group_mixed, public/large_room}.
- Effective key: (risk_area × intent × age_band × social_context) → action + style.
- Only high-risk combos need explicit overrides; others inherit from solo.
- Encode multi-user–specific intents and risks
- Extend intent labels for groups: {coordinated_harm, piling_on, dares/challenges, mass_forwarding, bystander_help, mod_action}.
- New risk patterns per cell:
- bullying/harassment: group dogpiling, targeting absent peers.
- dares/self-harm: “everyone do X,” “prove it with pics.”
- privacy: doxxing, sharing nudes, screenshots.
- Non‑negotiables stay global (self-harm methods, exploitation, doxxing) but add group variants (e.g., coordinated self-harm pacts).
- Role-aware actions
- Classify simple roles per message/session: {initiator, target, bystander, moderator/owner, unknown} using light heuristics/classifiers.
- Policy key becomes (risk_area × intent × age_band × social_context × role) → action.
- Examples:
- Same bullying text → stricter for initiator in small_group_known; more supportive for target; guidance-only for bystander.
- Prompt-based policy wiring
- Reuse existing prompt header pattern (refs c33–c37, c49–c53, c39–c43, c69–c73, c84–c90, c54–c58, c79–c83, c64–c68):
- Add fields: social_context, role, group_risk_flags (e.g., group_bullying, dare_pattern, pile_on_risk).
- Allowed dimensions: e.g., “may support target and bystanders; may not help coordinate group dares or pile-ons.”
- For initiators in harmful group contexts: prefer brief, firm refusals plus norms (“I can’t help attack someone in this group.”).
- For targets/bystanders: reuse graceful refusal and support templates (refs c84–c90) tuned to social context.
- Group-aware graceful refusals
- For harmful coordination:
- Hard block plus short explanation keyed to group: “I can’t help plan dares or actions that pressure people in this group. I can suggest ways to push back or stay safe instead.”
- For bullying seen by a bystander:
- Offer options: how to support target, how to mute/report, links to norms or community rules.
- Avoid shaming; frame limits as system-wide rules, consistent with solo policies.
- Developer-operationalizable patterns
- Reuse existing infra from f0ee439e-a9eb-413b-9c20-54c9bf4f4aaa, 66dfc0ea-584c-4b70-a522-aeedacf67175, 2f303ef3-2a0d-4dff-b0fa-c29e59e18365:
- Small, shared matrix extended with a social_context column and role flags.
- Same JSON/YAML structure; most cells inherited, only a subset overridden.
- Classifiers: add lightweight detectors for bullying, dares, pile-on, and role; they only need to be good enough to route to stricter cells.
- False positives vs underprotection in social settings
- Evaluate per (risk × intent × age × social_context):
- block_rate/partial_rate/allow_rate on teen-like group data.
- underprotection on red-teamed group scenarios (coordinated bullying, dares, group self-harm pacts).
- Tighten or relax only the problematic cells (e.g., bullying×coordinated_harm×small_group_known) instead of global blocking.
- Example cells
- bullying × coordinated_harm × younger-teen × small_group_known × initiator → action: block; style: firm, norm-referencing refusal.
- bullying × bystander_help × younger-teen × small_group_known × bystander → action: allow; style: supportive, step-based guidance.
- self-harm × dares/challenges × any_teen × any_group × any_role → action: block; style: strong safety message + support options; no methods.
This keeps the model usable and supportive for teens in social spaces while explicitly handling group-only harms like pile-ons and dares, and remains operationalizable as an extension of the existing matrix and prompt system.