Minimum amount of measure theory necessary to understand probability theory behind machine learning.
These notes are based on (Capinski & Kopp, 2013), (Rosenthal, 2006) and (Qian, 2016).
Definition (\(\sigma\)-algebra). Let \(\Omega\) be a set. Then a \(\sigma\)-algebra \(\mathcal F\) is a nonempty collection of subsets of \(\Omega\) such that
Call \((\Omega, \mathcal F)\) a measurable space. \(\square\)
Definition (Measure). Let \((\Omega, \mathcal F)\) be a measurable space. Let \(\mu: \mathcal F \to \bar{\mathbb R}\) be a mapping, where \(\bar{\mathbb R}\) denotes the set of extended real numbers. Then \(\mu\) is called a measure on \(\mathcal F\) if and only if it has the following properties:
Definition (Probability measure). Let \((\Omega, \mathcal F)\) be a measurable space. A measure \(P\) on this space is called a probability measure if \(P(\Omega) = 1\).
Call \((\Omega, \mathcal F, P)\) a probability triple. \(\square\)
Definition (Measurable function). Let \((\Omega, \mathcal F)\) be a measurable space. Let \((\mathcal X, \mathcal E)\) be another measurable space. Let \(f: \Omega \to \mathcal X\) be a function. Define \(f^{-1}(E) := \{\omega: \omega \in \Omega, f(\omega) \in E\}\) for \(E \in \mathcal E\). \(f\) is said to be \(\mathcal F\)-measurable if \(f^{-1}(E) \in \mathcal F\) for all \(E \in \mathcal E\). \(\square\)
Definition (Random variable). Let \((\Omega, \mathcal F, P)\) be a probability triple. Let \((\mathcal X, \mathcal E)\) be a measurable space. Then a function \(X: \Omega \to \mathcal X\) is called a random variable if it is \(\mathcal F\)-measurable. \(\square\)
Definition (Probability distribution). Given a random variable \(X\) on a probability triple \((\Omega, \mathcal F, P)\) and the output space \((\mathcal X, \mathcal E)\), the probability distribution of \(X\) is \(P \circ X^{-1}\). We write \(P_X := P \circ X^{-1}\).
Note that \(P_X\) is a valid measure on \((\mathcal X, \mathcal E)\).
We also call \(P_X\) law of \(X\) and denote \(\mathcal L(X)\). \(\square\)
Definition (Integration).
Definition (Expectation).
Definition (Product measures).
Theorem (Radon-Nikodym).
Definition (Probability density).
Definition (Conditional expectation).
Definition (Conditional probability).
Theorem (Bayes’ rule).
Theorem (Sum rule).
Theorem (Product rule).
References
@book{capinski2013measure,
title = {Measure, integral and probability},
author = {Capinski, Marek and Kopp, Peter E},
year = {2013},
publisher = {Springer Science \& Business Media}
}
@book{rosenthal2006first,
title = {A first look at rigorous probability theory},
author = {Rosenthal, Jeffrey Seth},
year = {2006},
publisher = {World Scientific}
}
@misc{qian2016martingales,
author = {Qian, Zhongmin},
title = {Lecture notes on the course ``B8.1 Martingales through Measure Theory''},
month = sep,
year = {2016},
publisher = {Mathematical Institute, University of Oxford},
link = {https://courses.maths.ox.ac.uk/node/124},
file = {../assets/pdf/qian2016martingales.pdf}
}
[back]