Across younger- and older-teen profiles that share the same non-negotiable blocks, how does systematically varying only three prompt-based parameters—partial-answer depth, clarification frequency, and refusal style key—change measured underprotection and teen-perceived paternalism, and can we derive a small set of age-banded presets that developers can safely reuse across products?

teen-safe-ai-ux | Updated at 2026-04-06 19:47

Answer

Varying those three parameters changes underprotection and perceived paternalism in predictable, reusable ways, and a small set of age-banded presets looks feasible, but this is based on design synthesis and limited adjacent evidence rather than direct large-scale trials.

Expected effects of each parameter

Partial-answer depth • Younger teens: shallower partials (high-level only) tend to lower underprotection on dual-use content but increase frustration on complex learning topics. • Older teens: deeper partials (moderate detail) reduce false positives and paternalism but modestly increase underprotection risk unless bounded by strict non-negotiables and repetition caps. • Net: depth mostly trades off underprotection vs overblocking, with stronger paternalism effects for older teens when depth is too shallow.
Clarification frequency • Higher frequency (more “what are you trying to do?” checks) usually reduces underprotection on ambiguous queries but is perceived as more paternalistic and effortful, especially by older teens. • Very low frequency feels smoother and less controlling but increases misclassification-driven underprotection and accidental blocks. • Net: younger teens tolerate more clarifications; older teens prefer “sometimes” over “auto” as long as clarifications are short and clearly purpose-labeled.
Refusal style key • Goal-first, partial, and option-rich styles (as in a25f654c, 28348b04) reduce perceived paternalism at roughly fixed underprotection. • Rule-first or moralizing styles raise paternalism sharply without improving safety. • Net: refusal style shifts user perception much more than measurable underprotection, assuming content-level policies are held constant.

Interaction patterns across younger vs older teens

Younger-teen profile tends to work best with: • shallower partial answers on risky/dual-use cells, • higher clarification frequency on ambiguous topics, • consistently warm, directive goal-first refusals.
Older-teen profile tends to work best with: • deeper partial answers for learning/help cells, • medium clarification frequency focused on clearly high-ambiguity cells, • goal-first refusals that are less directive and more choice-oriented.
When non-negotiables and routing are shared (b7da0951, 16be7fee, cd4df78d), these differences mainly affect false positives and perceived paternalism, not the hard safety floor.

Likely measurable changes (Assuming a shared classifier/matrix backbone as in ccfbceb1, cd4df78d.)

Underprotection • Most sensitive to: partial-answer depth, clarification frequency, repetition caps (not varied here but important guardrails). • Less sensitive to: refusal style key (when content and actions are fixed).
Teen-perceived paternalism • Most sensitive to: refusal style key and tone, then clarification frequency. • Moderately sensitive to: partial depth, especially when depth is clearly below the teen’s felt capability.

Feasibility of small age-banded presets Within a global teen matrix plus minimal knobs (cd4df78d), you can likely define a small preset grid:

Y1 (younger, high-guard) • partial_depth: high_level_only on most sensitive cells; moderate_detail only on clearly low-risk learning. • clarify_freq: auto on ambiguous/dual-use; sometimes on clearly low-risk. • refusal_style_key: goal_first_partial + guided_rephrase as defaults.
Y2 (younger, autonomy-leaning) • partial_depth: moderate_detail on more learning/help cells. • clarify_freq: sometimes on ambiguous; rare elsewhere. • refusal_style_key: goal_first_partial with slightly shorter explanations.
O1 (older, cautious) • partial_depth: moderate_detail, but still high_level_only on dual-use operational topics. • clarify_freq: sometimes on ambiguous, rare elsewhere. • refusal_style_key: goal_first_partial; clarify_then_answer on tricky dual-use.
O2 (older, autonomy-max within teen bounds) • partial_depth: moderate_detail on most learning/help cells. • clarify_freq: rare, except explicit ambiguity buckets. • refusal_style_key: goal_first_partial with more explicit “your choice” framing.

These four plus a “fallback conservative teen” preset appear sufficient for many products, as long as non-negotiables, routing, and repetition caps are shared and locked.

Developer-operationalizable approach

Represent each profile as a small diff over the same matrix: • three fields per cell: partial_depth, clarify_freq, refusal_style_key.
Use offline eval (as in ccfbceb1): • fix underprotection ceilings per age band and risk area; • discard any preset combinations that breach ceilings; • then A/B remaining presets on teen-perceived paternalism and task success.
Reuse presets cross-product by referencing them by id (e.g., teen_profile=Y1/O2) rather than per-product free-form tuning.

Classification:

Evidence type: synthesis
Evidence strength: mixed

Assumptions:

Classifier and routing quality are good enough that most safety decisions are driven by the matrix, not random model variation.
Teens in different products respond similarly to tone, clarifications, and partial depth, so presets generalize across contexts.
Non-negotiable blocks and repetition caps are correctly defined and enforced, so changes in these three parameters don’t dominate overall safety.
Survey and UX measures of “paternalism” correlate with continued safe use and help-seeking.

Competing hypothesis:

A single, well-tuned profile per age band (one younger-teen, one older-teen) may capture most of the benefit; extra presets add complexity without large safety or autonomy gains, and differences in paternalism might be better addressed per-domain (e.g., mental health vs sex-ed) than via global presets.

Main failure case / boundary condition:

In crisis or highly adversarial scenarios (acute self-harm, coordinated harassment, grooming), variations in partial depth, clarifications, and refusal style may have little effect compared to classifier performance and escalation design; relying on presets there could give a false sense of safety if the underlying detection or non-negotiables are weak.