Tuan Anh Le

Human Compatible

24 April 2023

These are notes on Stuart Russell’s book, Human Compatible: AI and the Problem of Control.

With recent AI advances like GPT-4 and other large language models (LLMs), resolving the question of AI control and safety becomes more urgent. This book is a good informal introduction to the topic, although given that it was published in 2019, it doesn’t discuss the many advances in the past four years. It might be worth additionally watching some of Stuart Russell’s more recent lectures (example). Also note that this is only one of many views in the now rapidly evolving field of AI safety, see pointers to literature by David Duvenaud. Overall, it definitely conveys the message that the AI community is building something powerful that can be either very good or very bad, so we should make sure it’s the former.

The core idea is that in order to design safe AI systems, we need to make sure

  1. Machines maximize the realization of human preferences.
  2. Machines are uncertain about human preferences.
  3. Machines learn about human preferences from human behavior.

This is in contrast to what Russell calls the “standard model” — that machines optimize a fixed objective which comes with many problems like

Other things I found interesting, in no particular order.