Remove Instructional Remove Learning Theory Remove Metrics
article thumbnail

The Theoretical Reward Learning Research Agenda: Introduction and Motivation

The AI Alignment Forum

Concretely, this research agenda involves answering questions such as: What is the right method for expressing goals and instructions to AI systems? Similarly, a complete answer to (3) would be a (pseudo)metric d on the space of all reward functions which quantifies their similarity.

article thumbnail

Stanford AI Lab Papers and Talks at NeurIPS 2021

Stanford AI Lab Blog

Kochenderfer Contact : philhc@stanford.edu Links: Paper Keywords : deep learning or neural networks, sparsity and feature selection, variational inference, (application) natural language and text processing Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss Authors : Jeff Z.

Contact 40
professionals

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Other Papers About the Theory of Reward Learning

The AI Alignment Forum

The third and final class of tasks I look at in this paper is a new category of objectives that I refer to as modal objectives, where the agent is given an instruction expressed not just in terms of what does happen along a given trajectory, but also in terms of what could happen. This paper is discussed in more detail in this post.

article thumbnail

Research directions Open Phil wants to fund in technical AI safety

The AI Alignment Forum

display inputs on which LLMs take undesirable/misaligned actions without being instructed to do so. 3.6) ) than we do inputs that include instructions to do some harmful task (as in Andriushchenko et al. Kumar et al. , OpenAI , Yuan et al. Jrviniemi and Hubinger ( 4) , and Meinke et al. Jrviniemi and Hubinger ( 4) , Meinke et al.