Center for AI Safety

Reducing societal-scale risks from AI through research, field-building, and advocacy.

About Us

Year Founded:

2022

Geography:

North America

Address:

San Francisco, California, USA

Visit Website

Mission

To reduce societal-scale risks associated withAI by conducting safety research, building the field of AI safety researchers,and advocating for safety standards.

Impact

AI areas they serve

Areas of Focus

Technical AI safety research

AI risk reduction

field-building for AI safety

compute infrastructure for safety research

AI policy advocacy

robustness of safety guardrails

AI hallucination detection

adversarial attacks on language models

AI Impact Areas

Technology

Policy & Regulations

Upcoming Goals

Continue expanding the compute cluster supporting ~20 AI safety research labs

Develop new benchmarks for empirically assessing AI safety

Grow the AI safety research field through funding and educational resources

Advance policy advocacy for AI safety standards

Milestones

2025: Supported California bill SB 1047 on safe AI through the CAIS Action Fund

2023: Published landmark Statement on AI Risk of Extinction, signed by 600+ leading AI researchers and public figures including heads of major AI companies

published "An Overview of Catastrophic AI Risks

2024: Launched SafeBench competition ($250K in prizes from Schmidt Sciences) to develop benchmarks for AI safety

developed "CUT" unlearning method featured by TIME

2022: Founded by Dan Hendrycks and Oliver Zhang

News

Recent Updates

SafeBench competition launched with $250,000 in prizes for best AI safety benchmarks

Published novel "CUT" unlearning method that removes hazardous model capabilities, featured by TIME

Weekly AI Safety Newsletter covering policy developments

See More News