Reward hacking: a potential source of serious Al misalignment

发布时间 2025-11-22 00:59:20 来源

Episode 设置

仅转录，不翻译

摘要 Prompt（当前值）：

登录已过期或未登录，无法修改。请先登录后再试。

We discuss our new paper, "Natural emergent misalignment from reward hacking in production RL". In this paper, we show for the ...

GPT-4正在为你翻译摘要中......