Remove Learning Theory Remove Metrics Remove Problem
article thumbnail

The Theoretical Reward Learning Research Agenda: Introduction and Motivation

The AI Alignment Forum

Finally, in the last post, I will also provide some resources for anyone who wants to contribute to this (or similar) research, in the form of both open problems and some thoughts on how these problems could be approached. We should only trust a reward learning method that is at least reasonably robust to such errors.

article thumbnail

Other Papers About the Theory of Reward Learning

The AI Alignment Forum

To me, the main takeaway from this paper is that we should be careful with the assumption that the basic RL setting really captures everything that we intuitively consider to be part of the problem domain of sequential decision-making. This paper is discussed in more detail in this post. Alternatively, see the main paper.

professionals

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Research directions Open Phil wants to fund in technical AI safety

The AI Alignment Forum

This guide provides an opinionated overview of recent work and open problems across areas like adversarial testing, model transparency, and theoretical approaches to AI alignment. Motivation: Two lines of recent work have looked for undesirable behaviors in LLMs, approaching the problem from two different angles: Andriushchenko et al.

article thumbnail

Moving from Red AI to Green AI, Part 1: How to Save the Environment and Reduce Your Hardware Costs

DataRobot

They are used for different applications, but nonetheless they suggest that the development in infrastructure (access to GPUs and TPUs for computing) and the development in deep learning theory has led to very large models. For us, we believe in using efficiency metrics in machine learning software.

Green 145
article thumbnail

Google at NeurIPS 2022

Google Research AI blog

A Workshop for Algorithmic Efficiency in Practical Neural Network Training Workshop Organizers include: Zachary Nado , George Dahl , Naman Agarwal , Aakanksha Chowdhery Invited Speakers include: Aakanksha Chowdhery , Priya Goyal Human in the Loop Learning (HiLL) Workshop Organizers include: Fisher Yu, Vittorio Ferrari Invited Speakers include: Dorsa (..)

Google 52
article thumbnail

AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability

The AI Alignment Forum

And the way you said it just then, it sounded more like the first one: heres a new nice metric of how good your mechanistic explanation is. You could imagine saying, Oh, we figured out that the difficulties in finding It was still kind of hard and our lives would be easier if we solved sub-problems, X, Y and Z.

Model 52