Center for AI Safety
Reducing societal-scale risks from AI through research, field-building, and advocacy.

To reduce societal-scale risks associated withAI by conducting safety research, building the field of AI safety researchers,and advocating for safety standards.
AI areas they serve
Areas of Focus
Technical AI safety research
AI risk reduction
field-building for AI safety
compute infrastructure for safety research
AI policy advocacy
robustness of safety guardrails
AI hallucination detection
adversarial attacks on language models
AI Impact Areas
Upcoming Goals
Continue expanding the compute cluster supporting ~20 AI safety research labs
Develop new benchmarks for empirically assessing AI safety
Grow the AI safety research field through funding and educational resources
Advance policy advocacy for AI safety standards
2025: Supported California bill SB 1047 on safe AI through the CAIS Action Fund
2023: Published landmark Statement on AI Risk of Extinction, signed by 600+ leading AI researchers and public figures including heads of major AI companies
published "An Overview of Catastrophic AI Risks
2024: Launched SafeBench competition ($250K in prizes from Schmidt Sciences) to develop benchmarks for AI safety
developed "CUT" unlearning method featured by TIME
2022: Founded by Dan Hendrycks and Oliver Zhang
