Resources · AI Safety at UCSB

§ 1 — Primer

AI safety is the study of how to make sure increasingly powerful AI systems do what we want, and that what we want is actually good.

In the last few years, AI systems have become more capable, faster, than almost anyone expected. Models trained on the open internet now write code, pass professional exams, and carry out multi-step tasks with little supervision. The trend line does not obviously stop.

That raises a set of questions that used to be academic and are now practical: how do we build systems this powerful and keep them aligned with human interests? How do we know what a model will do before we deploy it? Who gets to decide what it should do at all? What happens to institutions (jobs, science, democracy) if most cognitive work is done by machines?

"AI safety" is the loose label for work on these questions. It has a technical side (interpretability, alignment, evaluations, robustness) and a policy side (governance, coordination, oversight of frontier labs). It is not one field so much as several fields that happen to share a concern.

The aim of this group is to take that concern seriously: to read the primary sources, argue in good faith, and understand the strongest version of each position before agreeing or disagreeing with it.

Alignment: Making sure an AI system is actually trying to do what its designers (or users, or society) intended, rather than a close-but-wrong proxy.
Interpretability: Reading what's going on inside a neural network. If we can't inspect the computation, we're judging these systems by behavior alone.
Evaluations: Stress-testing models for dangerous capabilities (deception, cyber-offense, biorisk) before deployment.
Governance: The rules, norms, and institutions that decide who can build what, with which safeguards, answerable to whom.
Existential risk: The narrower claim that sufficiently advanced AI could be catastrophic at civilizational scale. Debated, but taken seriously by a growing fraction of researchers.

You do not need to already believe any of this to join us. Just be willing to engage with the arguments as if they might be right, and as if they might be wrong.

What is AI safety, and where should you start?

AI safety is the study of how to make sure increasingly powerful AI systems do what we want, and that what we want is actually good.

Three to read first.

Filtered by category.