# Reparameterization trick

05 September 2016

Consider a $$(\Omega_X, \mathcal F_X)$$-valued random variable $$X$$ and a $$(\Omega_Y, \mathcal F_Y)$$-valued random variable $$Y$$, both defined on a common probability space $$(\Omega, \mathcal F, \mathbb P)$$. Let $$f: (\Omega_X, \mathcal F_X) \to (\mathbb R, \mathcal B(\mathbb R))$$ be a measurable function (i.e. for all $$B \in \mathcal B(\mathbb R)$$, the pre-image $$f^{-1}(B) \in \mathcal F_X$$). Let $$g: (\Omega_Y, \mathcal F_Y) \to (\Omega_X, \mathcal F_X)$$ be a measurable function such that $$X = g \circ Y$$ (i.e. $$X(\omega) = g(Y(\omega)), \forall \omega \in \Omega$$). Hence we have, by definition, \begin{align} \E[f(X)] := \int_{\Omega} f(X(\omega)) \mathbb P(\mathrm d \omega) = \int_{\Omega} f(g(Y(\omega))) \mathbb P(\mathrm d \omega) =: \E[f(g(Y))]. \label{eq:reparam/exp} \end{align} Let $$P_X := \mathbb P \circ X^{-1}, P_Y := \mathbb P \circ Y^{-1}$$ be the probability distributions of $$X, Y$$. We have two Monte Carlo estimators of the same quantity in \eqref{eq:reparam/exp}: \begin{align} \E[f(X)] &\approx I_X^{MC} := \frac{1}{N} \sum_{i = 1}^N f(X^i), && X^i \sim P_X, i = 1, \dotsc, N \\
\E[f(g(Y))] &\approx I_Y^{MC} := \frac{1}{N} \sum_{i = 1}^N f \circ g (Y^i), && Y^i \sim P_Y, i = 1, \dotsc, N. \end{align}

## Why is it useful

Let the distribution $$P_{X, \theta}$$ be parameterized by $$\theta$$. We can’t evaluate $$\frac{\partial I_X^{MC}}{\partial \theta}$$. However, if $$P_Y$$ is not parameterized by $$\theta$$ and $$g_{\theta}$$ is parameterized by $$\theta$$ such that $$X = g_{\theta} \circ Y$$ then we can find $$\frac{\partial I_Y^{MC}}{\partial \theta}$$.

This is often used in variational autoencoders (missing reference) discussed here.

References

[back]