Authoring, Comparison and Learning Theory

Research directions Open Phil wants to fund in technical AI safety

The AI Alignment Forum

FEBRUARY 7, 2025

The authors successfully improve robustness against a wide range of adversarial attacks. Wed also be keen to see comparisons with supervised finetuning, RLHF, and adversarial training where appropriate. VC theory ) and the generalization performance we see in practice. Sheshadri et al. Guan et al. Karvonen et al. Abbe et al. )

Research

Research Fund Open Technique

AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability

The AI Alignment Forum

MARCH 28, 2025

Youre the lead author and then theres a bunch of other authors that I dont want to read on air, but can you give us a sense of just whats the idea here? Daniel Filan (00:28:50): If people remember my singular learning theory episodes , theyll get mad at you for saying that quadratics are all there is, but its a decent approximation. (00:28:56):

Model

Model Network Training Train

Nonprofit Technology

Research directions Open Phil wants to fund in technical AI safety

AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability

Webinars

Stay Connected