Remove Authoring Remove Comparison Remove Learning Theory
article thumbnail

Research directions Open Phil wants to fund in technical AI safety

The AI Alignment Forum

The authors successfully improve robustness against a wide range of adversarial attacks. Wed also be keen to see comparisons with supervised finetuning, RLHF, and adversarial training where appropriate. VC theory ) and the generalization performance we see in practice. Sheshadri et al. Guan et al. Karvonen et al. Abbe et al. )

article thumbnail

AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability

The AI Alignment Forum

Youre the lead author and then theres a bunch of other authors that I dont want to read on air, but can you give us a sense of just whats the idea here? Daniel Filan (00:28:50): If people remember my singular learning theory episodes , theyll get mad at you for saying that quadratics are all there is, but its a decent approximation. (00:28:56):

Model 52