Conservancy, Learning Theory and Metrics

Search:

DAY

WEEK

MONTH

YEAR

Select your country:
Sign up | Log in

Conservancy

Learning Theory

Metrics

Stanford AI Lab Papers and Talks at NeurIPS 2021

Stanford AI Lab Blog

DECEMBER 6, 2021

Kochenderfer Contact : philhc@stanford.edu Links: Paper Keywords : deep learning or neural networks, sparsity and feature selection, variational inference, (application) natural language and text processing Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss Authors : Jeff Z.

Contact

Contact Learning Theory Authoring Offline

Other Papers About the Theory of Reward Learning

The AI Alignment Forum

FEBRUARY 28, 2025

We also managed to leverage these results to produce a new method for conservative optimisation, that tells you how much (and in what way) you can optimise a proxy reward, based on the quality of that proxy (as measured by a STARC metric ), in order to be guaranteed that the true reward doesnt decrease (and thereby prevent the Goodhart drop).

Learning

Learning Discussion Classes Policy

Research directions Open Phil wants to fund in technical AI safety

The AI Alignment Forum

FEBRUARY 7, 2025

Motivation: Control evaluations are an attempt to conservatively evaluate the safety of protocols like AI-critiquing-AI (e.g., We prefer this definition of success at unlearning over the less conservative metrics like in Lynch et al because we think this definition more clearly distinguishes unlearning from safety training/robustness.

Research

Research Fund Open Technique

Webinars

The Everyday Donor: Unlocking Prospecting Segments Through Behavior Analysis

MORE WEBINARS

Nonprofit Technology

Stanford AI Lab Papers and Talks at NeurIPS 2021

Other Papers About the Theory of Reward Learning

Research directions Open Phil wants to fund in technical AI safety

Webinars

Stay Connected