How can we design teen-facing self-disclosure and context-gathering prompts (e.g., asking whether a question is for school, health, or curiosity) so that they meaningfully reduce false positives on legitimate learning and support requests without nudging teens to misrepresent their intent or revealing more sensitive information than is necessary for safety decisions?

teen-safe-ai-ux | Updated at 2026-04-06 18:53

Answer

Use minimal, goal-focused chips, strict scoping, and clear payoffs, wired into the existing risk×intent×age matrix.

Keep disclosure minimal and optional

Ask only for coarse goal labels (e.g., [school / health / curiosity / support]) when the classifier is uncertain, not on every query.
Explain why: “This helps me avoid wrongly blocking legit questions.”
Allow a quick “skip” path that still returns a safe but possibly more conservative answer.

Use structured, not free-text, self-disclosure

Offer short, non-sensitive chips instead of open text: “This is for: [homework / sex-ed / mental health / story / game].”
Treat chips as hints about intent only; never as a channel to share names, locations, or diagnoses.

Constrain how disclosures affect policy

Chips can upgrade unknown→learning/support intent but must not downgrade risk (e.g., exploitation stays high risk regardless of “homework”).
Map chip+classifier to a single cell in the shared risk_area×intent×age_band matrix and apply that cell’s action and refusal style.
For non-negotiable harms, ignore self-reported benign intent; still block with graceful refusal.

Avoid nudging to misrepresent

Don’t signal which choice will “unlock” content; avoid wording like “Choose school so I can answer.”
Keep options symmetric and neutral: “What best fits this question?” with equal styling.
If behavior must differ (e.g., health vs curiosity), describe rule-level differences, not tactics: “For health questions I may ask more safety questions first.”

Protect privacy by scoping questions

Never ask for identity, location, or persistent traits in these prompts; keep them about the current goal only.
Prefer single-step disambiguation over long questionnaires.
Auto-time-limit use of tags to the current turn or short session; don’t store teen self-disclosures longer than needed for safety/eval.

Provide value even when skipping or blocked

If user skips chips: answer using conservative defaults plus, when needed, a brief clarifier in-line rather than a separate form.
On blocks: use graceful refusals that restate goal and give high-level help or safer alternatives, independent of whether chips were filled.

Evaluate and tune with teen data

Measure, per matrix cell: (a) false positives with and without chips, (b) underprotection, and (c) rate of obvious mis-labeling (e.g., obviously non-school queries marked “school”).
If misrepresentation spikes in a domain, simplify choices or reduce payoff differences between options.

Developer-operationalizable recipe

Add a “needs_intent_clarification” flag to matrix cells where ambiguity drives false positives.
For those cells, configure: {chip_set_id, when_to_ask, effect_on_intent, logging_policy}.
Reuse shared chip sets across products (chat, search, edu) and wire them via prompt headers that pass {risk_area, intent, age_band, chip_choice} into the model.

This keeps self-disclosure light, reduces teen false positives mainly where ambiguity is high, and limits both privacy exposure and incentives to lie.