This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
Published on February 28, 2025 7:26 PM GMT This is the seventh post in the theoretical reward learning sequence , which starts in this post. The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret In this paper , we look at what happens when a learnt reward function is optimised.
Some relevant criteria for evaluating a specification language include: How expressive is the language? The Rest of this Sequence In the coming entries of this sequence, I will provide in-depth summaries of some of my papers, and explain their setup and results in more detail (but less detail than what is provided in the papers themselves).
Is that a fine, very brief summary of this? Jason Gross (00:27:59): Yeah, I think thats a pretty good summary of the theoretical approach. Daniel Filan (00:28:50): If people remember my singular learningtheory episodes , theyll get mad at you for saying that quadratics are all there is, but its a decent approximation. (00:28:56):
We organize all of the trending information in your field so you don't have to. Join 12,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content