Instructional, Law and Learning Theory

The Theoretical Reward Learning Research Agenda: Introduction and Motivation

The AI Alignment Forum

FEBRUARY 28, 2025

Concretely, this research agenda involves answering questions such as: What is the right method for expressing goals and instructions to AI systems? Some notable possible candidates include: Goodharts Law. Which specification learning algorithms are guaranteed to converge to a good specification?

Research

Research Learning Method Policy

Other Papers About the Theory of Reward Learning

The AI Alignment Forum

FEBRUARY 28, 2025

The third and final class of tasks I look at in this paper is a new category of objectives that I refer to as modal objectives, where the agent is given an instruction expressed not just in terms of what does happen along a given trajectory, but also in terms of what could happen. This paper is also discussed in this post (Paper 4).

Learning

Learning Discussion Classes Policy

Nonprofit Technology

The Theoretical Reward Learning Research Agenda: Introduction and Motivation

Other Papers About the Theory of Reward Learning

Webinars

Stay Connected