Tuan Anh Le

Reparameterization trick

05 September 2016

Consider a \((\Omega_X, \mathcal F_X)\)-valued random variable \(X\) and a \((\Omega_Y, \mathcal F_Y)\)-valued random variable \(Y\), both defined on a common probability space \((\Omega, \mathcal F, \mathbb P)\). Let \(f: (\Omega_X, \mathcal F_X) \to (\mathbb R, \mathcal B(\mathbb R))\) be a measurable function (i.e. for all \(B \in \mathcal B(\mathbb R)\), the pre-image \(f^{-1}(B) \in \mathcal F_X\)). Let \(g: (\Omega_Y, \mathcal F_Y) \to (\Omega_X, \mathcal F_X)\) be a measurable function such that \(X = g \circ Y\) (i.e. \(X(\omega) = g(Y(\omega)), \forall \omega \in \Omega\)). Hence we have, by definition, \begin{align} \E[f(X)] := \int_{\Omega} f(X(\omega)) \mathbb P(\mathrm d \omega) = \int_{\Omega} f(g(Y(\omega))) \mathbb P(\mathrm d \omega) =: \E[f(g(Y))]. \label{eq:reparam/exp} \end{align} Let \(P_X := \mathbb P \circ X^{-1}, P_Y := \mathbb P \circ Y^{-1}\) be the probability distributions of \(X, Y\). We have two Monte Carlo estimators of the same quantity in \eqref{eq:reparam/exp}: \begin{align} \E[f(X)] &\approx I_X^{MC} := \frac{1}{N} \sum_{i = 1}^N f(X^i), && X^i \sim P_X, i = 1, \dotsc, N \\
\E[f(g(Y))] &\approx I_Y^{MC} := \frac{1}{N} \sum_{i = 1}^N f \circ g (Y^i), && Y^i \sim P_Y, i = 1, \dotsc, N. \end{align}

Why is it useful

Let the distribution \(P_{X, \theta}\) be parameterized by \(\theta\). We can’t evaluate \(\frac{\partial I_X^{MC}}{\partial \theta}\). However, if \(P_Y\) is not parameterized by \(\theta\) and \(g_{\theta}\) is parameterized by \(\theta\) such that \(X = g_{\theta} \circ Y\) then we can find \(\frac{\partial I_Y^{MC}}{\partial \theta}\).

This is often used in variational autoencoders (missing reference) discussed here.