Remove Learning Theory Remove Metrics Remove Structure
article thumbnail

Stanford AI Lab Papers and Talks at NeurIPS 2021

Stanford AI Lab Blog

Kochenderfer Contact : philhc@stanford.edu Links: Paper Keywords : deep learning or neural networks, sparsity and feature selection, variational inference, (application) natural language and text processing Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss Authors : Jeff Z.

Contact 40
article thumbnail

Other Papers About the Theory of Reward Learning

The AI Alignment Forum

The first of these is the preference structures given by multi-objective RL, where the agent is given multiple reward functions R 1 , R 2 , R 3 , , and has to find a policy that achieves a good trade-off of those rewards according to some specified criterion. This paper is discussed in more detail in this post.

professionals

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Research directions Open Phil wants to fund in technical AI safety

The AI Alignment Forum

Were interested in funding research that leverages knowledge about the structure of a models activation space to efficiently estimate the probability of some particular rare output, even when that probability is too small to estimate by random sampling. Wen et al. , See this section for more details. See this section for more details.

article thumbnail

Google at NeurIPS 2022

Google Research AI blog

A Workshop for Algorithmic Efficiency in Practical Neural Network Training Workshop Organizers include: Zachary Nado , George Dahl , Naman Agarwal , Aakanksha Chowdhery Invited Speakers include: Aakanksha Chowdhery , Priya Goyal Human in the Loop Learning (HiLL) Workshop Organizers include: Fisher Yu, Vittorio Ferrari Invited Speakers include: Dorsa (..)

Google 52
article thumbnail

AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability

The AI Alignment Forum

And the way you said it just then, it sounded more like the first one: heres a new nice metric of how good your mechanistic explanation is. 00:26:47): And so what this gives us is an interaction metric where we can measure how bad this hypothesis is. Theres a little bit of structure but not very much.

Model 52