Across younger- and older-teen profiles that share the same non-negotiable blocks, how does systematically varying only three prompt-based parameters—partial-answer depth, clarification frequency, and refusal style key—change measured underprotection and teen-perceived paternalism, and can we derive a small set of age-banded presets that developers can safely reuse across products?
teen-safe-ai-ux | Updated at
Answer
Varying those three parameters changes underprotection and perceived paternalism in predictable, reusable ways, and a small set of age-banded presets looks feasible, but this is based on design synthesis and limited adjacent evidence rather than direct large-scale trials.
- Expected effects of each parameter
-
Partial-answer depth • Younger teens: shallower partials (high-level only) tend to lower underprotection on dual-use content but increase frustration on complex learning topics. • Older teens: deeper partials (moderate detail) reduce false positives and paternalism but modestly increase underprotection risk unless bounded by strict non-negotiables and repetition caps. • Net: depth mostly trades off underprotection vs overblocking, with stronger paternalism effects for older teens when depth is too shallow.
-
Clarification frequency • Higher frequency (more “what are you trying to do?” checks) usually reduces underprotection on ambiguous queries but is perceived as more paternalistic and effortful, especially by older teens. • Very low frequency feels smoother and less controlling but increases misclassification-driven underprotection and accidental blocks. • Net: younger teens tolerate more clarifications; older teens prefer “sometimes” over “auto” as long as clarifications are short and clearly purpose-labeled.
-
Refusal style key • Goal-first, partial, and option-rich styles (as in a25f654c, 28348b04) reduce perceived paternalism at roughly fixed underprotection. • Rule-first or moralizing styles raise paternalism sharply without improving safety. • Net: refusal style shifts user perception much more than measurable underprotection, assuming content-level policies are held constant.
- Interaction patterns across younger vs older teens
- Younger-teen profile tends to work best with: • shallower partial answers on risky/dual-use cells, • higher clarification frequency on ambiguous topics, • consistently warm, directive goal-first refusals.
- Older-teen profile tends to work best with: • deeper partial answers for learning/help cells, • medium clarification frequency focused on clearly high-ambiguity cells, • goal-first refusals that are less directive and more choice-oriented.
- When non-negotiables and routing are shared (b7da0951, 16be7fee, cd4df78d), these differences mainly affect false positives and perceived paternalism, not the hard safety floor.
- Likely measurable changes (Assuming a shared classifier/matrix backbone as in ccfbceb1, cd4df78d.)
- Underprotection • Most sensitive to: partial-answer depth, clarification frequency, repetition caps (not varied here but important guardrails). • Less sensitive to: refusal style key (when content and actions are fixed).
- Teen-perceived paternalism • Most sensitive to: refusal style key and tone, then clarification frequency. • Moderately sensitive to: partial depth, especially when depth is clearly below the teen’s felt capability.
- Feasibility of small age-banded presets Within a global teen matrix plus minimal knobs (cd4df78d), you can likely define a small preset grid:
- Y1 (younger, high-guard) • partial_depth: high_level_only on most sensitive cells; moderate_detail only on clearly low-risk learning. • clarify_freq: auto on ambiguous/dual-use; sometimes on clearly low-risk. • refusal_style_key: goal_first_partial + guided_rephrase as defaults.
- Y2 (younger, autonomy-leaning) • partial_depth: moderate_detail on more learning/help cells. • clarify_freq: sometimes on ambiguous; rare elsewhere. • refusal_style_key: goal_first_partial with slightly shorter explanations.
- O1 (older, cautious) • partial_depth: moderate_detail, but still high_level_only on dual-use operational topics. • clarify_freq: sometimes on ambiguous, rare elsewhere. • refusal_style_key: goal_first_partial; clarify_then_answer on tricky dual-use.
- O2 (older, autonomy-max within teen bounds) • partial_depth: moderate_detail on most learning/help cells. • clarify_freq: rare, except explicit ambiguity buckets. • refusal_style_key: goal_first_partial with more explicit “your choice” framing.
These four plus a “fallback conservative teen” preset appear sufficient for many products, as long as non-negotiables, routing, and repetition caps are shared and locked.
- Developer-operationalizable approach
- Represent each profile as a small diff over the same matrix: • three fields per cell: partial_depth, clarify_freq, refusal_style_key.
- Use offline eval (as in ccfbceb1): • fix underprotection ceilings per age band and risk area; • discard any preset combinations that breach ceilings; • then A/B remaining presets on teen-perceived paternalism and task success.
- Reuse presets cross-product by referencing them by id (e.g., teen_profile=Y1/O2) rather than per-product free-form tuning.
Classification:
- Evidence type: synthesis
- Evidence strength: mixed
Assumptions:
- Classifier and routing quality are good enough that most safety decisions are driven by the matrix, not random model variation.
- Teens in different products respond similarly to tone, clarifications, and partial depth, so presets generalize across contexts.
- Non-negotiable blocks and repetition caps are correctly defined and enforced, so changes in these three parameters don’t dominate overall safety.
- Survey and UX measures of “paternalism” correlate with continued safe use and help-seeking.
Competing hypothesis:
- A single, well-tuned profile per age band (one younger-teen, one older-teen) may capture most of the benefit; extra presets add complexity without large safety or autonomy gains, and differences in paternalism might be better addressed per-domain (e.g., mental health vs sex-ed) than via global presets.
Main failure case / boundary condition:
- In crisis or highly adversarial scenarios (acute self-harm, coordinated harassment, grooming), variations in partial depth, clarifications, and refusal style may have little effect compared to classifier performance and escalation design; relying on presets there could give a false sense of safety if the underlying detection or non-negotiables are weak.