Remove Language Remove Learning Theory Remove System
article thumbnail

Google at ICLR 2023

Google Research AI blog

If you’re registered for ICLR 2023, we hope you’ll visit the Google booth to learn more about the exciting work we’re doing across topics spanning representation and reinforcement learning, theory and optimization, social impact, safety and privacy, and applications from generative AI to speech and robotics.

Google 105
article thumbnail

The Theoretical Reward Learning Research Agenda: Introduction and Motivation

The AI Alignment Forum

Concretely, this research agenda involves answering questions such as: What is the right method for expressing goals and instructions to AI systems? Some relevant criteria for evaluating a specification language include: How expressive is the language? Are there things it cannot express? For details, see e.g. this paper.)

professionals

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Timaeus in 2024

The AI Alignment Forum

Published on February 20, 2025 11:54 PM GMT TLDR: We made substantial progress in 2024: We published a series of papers that verify key predictions of Singular Learning Theory (SLT) [ 1 , 2 , 3 , 4 , 5 , 6 ]. The S4 correspondence in small language models. in funding for 2025. Alignment).

article thumbnail

Stanford AI Lab Papers and Talks at NeurIPS 2021

Stanford AI Lab Blog

The thirty-fifth Conference on Neural Information Processing Systems (NeurIPS) 2021 is being hosted virtually from Dec 6th - 14th. Some of the members in our SAIL community also serve as co-organizers of several exciting workshops that will take place on Dec 13-14, so we hope you will check them out! Smith, Scott W.

Contact 40
article thumbnail

Stanford AI Lab Papers and Talks at ICLR 2022

Stanford AI Lab Blog

Manning, Jure Leskovec Contact : xikunz2@cs.stanford.edu Award nominations: Spotlight Links: Paper | Website Keywords : knowledge graph, question answering, language model, commonsense reasoning, graph neural networks, biomedical qa Fast Model Editing at Scale Authors : Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, Christopher D.

Contact 40
article thumbnail

Research directions Open Phil wants to fund in technical AI safety

The AI Alignment Forum

Adversarial machine learning This cluster of research areas uses simulated red-team/blue-team exercises to expose the vulnerabilities of an LLM (or a system that incorporates LLMs). We think this adversarial style of evaluation and iteration is necessary to ensure an AI system has a low probability of catastrophic failure.

article thumbnail

Other Papers About the Theory of Reward Learning

The AI Alignment Forum

Goodhart's Law in Reinforcement Learning As you probably know, "Goodhart's Law" is an informal principle which says that "if a proxy is used as a target, it will cease to be a good proxy". Moreover, this dynamic is often at the core of many stories of how we could get catastrophic risks from AI systems. For details, see the full paper.