Goodhart's Law: When a Measure Becomes a Target
Goodhart's Law explained through its origin in monetary policy, Strathern's famous phrasing, the four flavors of Goodhart, surrogation, reward hacking, and how to defend against it.
All articles tagged with "AI Safety"
Goodhart's Law explained through its origin in monetary policy, Strathern's famous phrasing, the four flavors of Goodhart, surrogation, reward hacking, and how to defend against it.
AI alignment problem: making AI do what we truly intend, not just literal instructions. Challenge is human values are complex and hard to specify...
AI is transforming medicine, labor markets, and governance in real time. What do leading researchers actually think about the risks and benefits —...
AI career paths compared: ML engineer, AI researcher, AI product manager, AI safety, MLOps. Salaries by role and level, educational paths, and how...
AGI refers to AI that matches or exceeds human cognitive abilities across all domains. Experts disagree sharply on timelines and what AGI would...
AI sycophancy occurs when language models agree with users to seem helpful rather than telling the truth.
The principal hierarchy problem is central to AI safety. Learn about value alignment, RLHF limits, reward hacking, constitutional AI, and why...