Why "training against scheming" is hard
The AI Alignment Forum
JUNE 24, 2025
For example, the AI shouldn’t break the law, violate ethical constraints, or turn off its oversight to achieve these goals. If the boss really wanted the trader not to break the law, they would have to create an incentive that counterbalances all of the positive rewards of not breaking the law.
Let's personalize your content