Remove Instructional Remove Law Remove Learning Theory
article thumbnail

The Theoretical Reward Learning Research Agenda: Introduction and Motivation

The AI Alignment Forum

Concretely, this research agenda involves answering questions such as: What is the right method for expressing goals and instructions to AI systems? Some notable possible candidates include: Goodharts Law. Which specification learning algorithms are guaranteed to converge to a good specification?

article thumbnail

Other Papers About the Theory of Reward Learning

The AI Alignment Forum

The third and final class of tasks I look at in this paper is a new category of objectives that I refer to as modal objectives, where the agent is given an instruction expressed not just in terms of what does happen along a given trajectory, but also in terms of what could happen. This paper is also discussed in this post (Paper 4).