Instructional, Learning Theory and Metrics

Instructional

Learning Theory

Metrics

The Theoretical Reward Learning Research Agenda: Introduction and Motivation

The AI Alignment Forum

FEBRUARY 28, 2025

Concretely, this research agenda involves answering questions such as: What is the right method for expressing goals and instructions to AI systems? Similarly, a complete answer to (3) would be a (pseudo)metric d on the space of all reward functions which quantifies their similarity.

Research

Research Learning Method Policy

Stanford AI Lab Papers and Talks at NeurIPS 2021

Stanford AI Lab Blog

DECEMBER 6, 2021

Kochenderfer Contact : philhc@stanford.edu Links: Paper Keywords : deep learning or neural networks, sparsity and feature selection, variational inference, (application) natural language and text processing Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss Authors : Jeff Z.

Contact

Contact Learning Theory Authoring Offline

Join 12,000+

professionals

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

The Everyday Donor: Unlocking Prospecting Segments Through Behavior Analysis

A New Look At Grant Management: Why Use A System When You Have Excel?

MORE WEBINARS

Trending Sources

Other Papers About the Theory of Reward Learning

The AI Alignment Forum

FEBRUARY 28, 2025

The third and final class of tasks I look at in this paper is a new category of objectives that I refer to as modal objectives, where the agent is given an instruction expressed not just in terms of what does happen along a given trajectory, but also in terms of what could happen. This paper is discussed in more detail in this post.

Learning

Learning Discussion Classes Policy

Webinars

The Everyday Donor: Unlocking Prospecting Segments Through Behavior Analysis

A New Look At Grant Management: Why Use A System When You Have Excel?

MORE WEBINARS

Research directions Open Phil wants to fund in technical AI safety

The AI Alignment Forum

FEBRUARY 7, 2025

display inputs on which LLMs take undesirable/misaligned actions without being instructed to do so. 3.6) ) than we do inputs that include instructions to do some harmful task (as in Andriushchenko et al. Kumar et al. , OpenAI , Yuan et al. Jrviniemi and Hubinger ( 4) , and Meinke et al. Jrviniemi and Hubinger ( 4) , Meinke et al.

Research

Research Fund Open Technique

Nonprofit Technology

The Theoretical Reward Learning Research Agenda: Introduction and Motivation

Stanford AI Lab Papers and Talks at NeurIPS 2021

Webinars

Trending Sources

Other Papers About the Theory of Reward Learning

Webinars

Research directions Open Phil wants to fund in technical AI safety

Stay Connected