Comparison, Learning Theory and Local

Research directions Open Phil wants to fund in technical AI safety

The AI Alignment Forum

FEBRUARY 7, 2025

Wed also be keen to see comparisons with supervised finetuning, RLHF, and adversarial training where appropriate. More ambitiously, research like this could advance our understanding of learning mechanisms in general (cf. and could be useful for testing this theorys predictions. Karvonen et al. this Manifold market ).

Research

Research Fund Open Technique

AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability

The AI Alignment Forum

MARCH 28, 2025

Daniel Filan (00:28:50): If people remember my singular learning theory episodes , theyll get mad at you for saying that quadratics are all there is, but its a decent approximation. (00:28:56): But maybe zooming out, the relevant comparison point here I think is not the number of parameters in the model.

Model

Model Network Train Training

Nonprofit Technology

Research directions Open Phil wants to fund in technical AI safety

AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability

Webinars

Stay Connected