Conservancy, Learning Theory and Phase

Research directions Open Phil wants to fund in technical AI safety

The AI Alignment Forum

FEBRUARY 7, 2025

Motivation: Control evaluations are an attempt to conservatively evaluate the safety of protocols like AI-critiquing-AI (e.g., We prefer this definition of success at unlearning over the less conservative metrics like in Lynch et al because we think this definition more clearly distinguishes unlearning from safety training/robustness.

Research

Research Fund Open Technique

Nonprofit Technology

Research directions Open Phil wants to fund in technical AI safety

Webinars

Stay Connected