If we treat discovered emotion vectors not as privileged axes but as one candidate basis among many low-rank decompositions of safety-relevant hidden states, does an alternative basis optimized purely for predicting and controlling concrete safety metrics (e.g., refusal accuracy, harm severity scores, calibration) systematically outperform the emotion-based basis—and, if so, does this indicate that functional emotions are a suboptimal or misleading coordinate system for safety intervention design?

anthropic-functional-emotions | Updated at 2026-04-07 07:35

Answer

An alternative basis optimized directly for concrete safety metrics will probably outperform a purely emotion-vector basis on narrow prediction/control of those metrics, but this does not by itself show that functional emotions are a misleading coordinate system. Instead, it suggests that:

For core scalar safety targets (refusal accuracy, harm severity, calibration), a task-optimized low-rank basis is likely more efficient and higher-performing than generic emotion vectors.
Functional emotion coordinates remain useful as a mid-level, partially interpretable basis for (i) shaping nuanced social behavior and (ii) understanding how safety-relevant control signals bundle together, even if they are not the optimal basis for pure metric optimization.

So the best view is: metric-optimized bases should be primary for safety control; functional-emotion bases are complementary for interpretability and socially nuanced behavior shaping, and become misleading only if treated as uniquely privileged or complete for safety design.