Reward hacking: a potential source of serious Al misalignment

发布时间    来源
Episode 设置




摘要

We discuss our new paper, "Natural emergent misalignment from reward hacking in production RL". In this paper, we show for the ...

GPT-4正在为你翻译摘要中......

中英文字稿