Tuan Anh Le

Measure theory for probability (UNFINISHED)

Minimum amount of measure theory necessary to understand probability theory behind machine learning.

These notes are based on (Capinski & Kopp, 2013), (Rosenthal, 2006) and (Qian, 2016).

Definition (\(\sigma\)-algebra). Let \(\Omega\) be a set. Then a \(\sigma\)-algebra \(\mathcal F\) is a nonempty collection of subsets of \(\Omega\) such that

  1. \(\Omega \in \mathcal F\).
  2. If \(A\) is in \(\mathcal F\), then so is the complement of \(A\).
  3. If \(A_n\) is a sequence of elements of \(\mathcal F\), then the union of \(A_n\) is in \(\mathcal F\).

Call \((\Omega, \mathcal F)\) a measurable space. \(\square\)

Definition (Measure). Let \((\Omega, \mathcal F)\) be a measurable space. Let \(\mu: \mathcal F \to \bar{\mathbb R}\) be a mapping, where \(\bar{\mathbb R}\) denotes the set of extended real numbers. Then \(\mu\) is called a measure on \(\mathcal F\) if and only if it has the following properties:

  1. For every \(F \in \mathcal F\), \(\mu(F) \geq 0\).
  2. For every sequence of pairwise disjoint sets \(S_n \subseteq \Omega\): \begin{align} \mu\left(\cup_{n = 1}^\infty S_n \right) = \sum_{n = 1}^\infty \mu(S_n). \end{align} (that is, \(\mu\) is a countably additive function)
  3. \(\mu(\emptyset) = 0\). \(\square\)

Definition (Probability measure). Let \((\Omega, \mathcal F)\) be a measurable space. A measure \(P\) on this space is called a probability measure if \(P(\Omega) = 1\).

Call \((\Omega, \mathcal F, P)\) a probability triple. \(\square\)

Definition (Measurable function). Let \((\Omega, \mathcal F)\) be a measurable space. Let \((\mathcal X, \mathcal E)\) be another measurable space. Let \(f: \Omega \to \mathcal X\) be a function. Define \(f^{-1}(E) := \{\omega: \omega \in \Omega, f(\omega) \in E\}\) for \(E \in \mathcal E\). \(f\) is said to be \(\mathcal F\)-measurable if \(f^{-1}(E) \in \mathcal F\) for all \(E \in \mathcal E\). \(\square\)

Definition (Random variable). Let \((\Omega, \mathcal F, P)\) be a probability triple. Let \((\mathcal X, \mathcal E)\) be a measurable space. Then a function \(X: \Omega \to \mathcal X\) is called a random variable if it is \(\mathcal F\)-measurable. \(\square\)

Definition (Probability distribution). Given a random variable \(X\) on a probability triple \((\Omega, \mathcal F, P)\) and the output space \((\mathcal X, \mathcal E)\), the probability distribution of \(X\) is \(P \circ X^{-1}\). We write \(P_X := P \circ X^{-1}\).

Note that \(P_X\) is a valid measure on \((\mathcal X, \mathcal E)\).

We also call \(P_X\) law of \(X\) and denote \(\mathcal L(X)\). \(\square\)

Definition (Integration).

Definition (Expectation).

Definition (Product measures).

Theorem (Radon-Nikodym).

Definition (Probability density).

Definition (Conditional expectation).

Definition (Conditional probability).

Theorem (Bayes’ rule).

Theorem (Sum rule).

Theorem (Product rule).


  1. Capinski, M., & Kopp, P. E. (2013). Measure, integral and probability. Springer Science & Business Media.
      title = {Measure, integral and probability},
      author = {Capinski, Marek and Kopp, Peter E},
      year = {2013},
      publisher = {Springer Science \& Business Media}
  2. Rosenthal, J. S. (2006). A first look at rigorous probability theory. World Scientific.
      title = {A first look at rigorous probability theory},
      author = {Rosenthal, Jeffrey Seth},
      year = {2006},
      publisher = {World Scientific}
  3. Qian, Z. (2016). Lecture notes on the course “B8.1 Martingales through Measure Theory.” Mathematical Institute, University of Oxford.
      author = {Qian, Zhongmin},
      title = {Lecture notes on the course ``B8.1 Martingales through Measure Theory''},
      month = sep,
      year = {2016},
      publisher = {Mathematical Institute, University of Oxford},
      link = {https://courses.maths.ox.ac.uk/node/124},
      file = {../assets/pdf/qian2016martingales.pdf}