The Theoretical Reward Learning Research Agenda: Introduction and Motivation
The AI Alignment Forum
FEBRUARY 28, 2025
Concretely, this research agenda involves answering questions such as: What is the right method for expressing goals and instructions to AI systems? The next question is whether or not a given reward learning method is guaranteed to converge to a reward function that is sufficiently accurate in this sense.
Let's personalize your content