Advancing AI alignment through frameworks that acknowledge multiple valid perspectives in complex ethical scenarios.
Traditional alignment approaches assume a singular value system can be discovered and universally applied. Our research demonstrates this is both impossible and undesirable. We develop frameworks for AI systems that must make committed decisions despite moral complexity and conflicting stakeholder interests.
We evaluate not just what values an AI system expresses, but how consistently it reasons about values (moral competence) and how it navigates uncertainty in both facts and values when taking action (moral confidence). This layered approach identifies failure modes from moral paralysis to harmful overreach.
We develop methods to assess how AI systems adapt their value expressions based on function, stakeholder relationships, and situational context. We research the variation in what constitutes value alignment across different domains and use cases.