Remove Evaluation Remove Learning Theory Remove Summary
article thumbnail

Other Papers About the Theory of Reward Learning

The AI Alignment Forum

Published on February 28, 2025 7:26 PM GMT This is the seventh post in the theoretical reward learning sequence , which starts in this post. The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret In this paper , we look at what happens when a learnt reward function is optimised.

article thumbnail

The Theoretical Reward Learning Research Agenda: Introduction and Motivation

The AI Alignment Forum

Some relevant criteria for evaluating a specification language include: How expressive is the language? The Rest of this Sequence In the coming entries of this sequence, I will provide in-depth summaries of some of my papers, and explain their setup and results in more detail (but less detail than what is provided in the papers themselves).

article thumbnail

AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability

The AI Alignment Forum

Is that a fine, very brief summary of this? Jason Gross (00:27:59): Yeah, I think thats a pretty good summary of the theoretical approach. Daniel Filan (00:28:50): If people remember my singular learning theory episodes , theyll get mad at you for saying that quadratics are all there is, but its a decent approximation. (00:28:56):

Model 52