*07 November 2016*

Notes on the *Concrete distribution* (continuous relaxations of discrete distributions) which was introduced by (Maddison et al., 2016) and (Jang et al., 2016).
Notes are based on the former.

Given a random variable \(X: \Omega \to \mathcal X\) with distribution \(p_{\phi}(x)\) parameterized by \(\phi\); and a function \(f: \mathcal X \to \mathbb R\). The goal is to approximate \(\frac{\partial}{\partial \phi} \E[f(X)]\) using the reparameterization trick (instead of the REINFORCE trick) when \(X\) is discrete, i.e. when \(\mathcal X\) is countable.

The support of a \(K\)-dimensional Concrete distribution is the simplex \begin{align} \Delta^{K - 1} = \left\{x \in \mathbb R^K: x_k \in [0, 1], \sum_{k = 1}^K x_k = 1\right\}. \end{align} A sample \(X = (X_1, \dotsc, X_K)\) from the Concrete distributionâ€”parameterized by location parameters \(\alpha = (\alpha_1, \dotsc, \alpha_K) \in (0, \infty)^K\), and temperature \(\lambda \in (0, \infty)\)â€”can be generated as follows:

For \(k = 1, \dotsc, K\):

- Generate a sample \(G_k\) from a Gumbel distribution by setting \(G_k \leftarrow (-\log(-\log U_k))\) where \(U_k\) is a sample from \(\mathrm{Uniform}(0, 1)\).
- Set \begin{align} X_k \leftarrow \frac{\exp((\log \alpha_k + G_k)/\lambda)}{\sum_{i = 1}^K \exp((\log \alpha_i + G_i)/\lambda)}. \end{align}

The Concrete distribution is actually defined by its density (see equation (11) in (Maddison et al., 2016)). As \(\lambda\) is smaller, the approximation to the one-hot discrete random variable with unnormalized weights \(\alpha\) is more accurate in the sense of Proposition 1 in (Maddison et al., 2016).

Let \(Z = (Z_1, \dotsc, Z_K)\). Let \(Z_k, k = 1, \dotsc, K\) be distributed from \(\mathrm{Gumbel}(0, 1)\), and the \(k\)th output of the function \(g_{\phi}: \mathbb R^K \to \mathbb R^K\) be: \begin{align} \left[{g_{\phi}(Z)}\right]_k := \frac{\exp((\log \alpha_k + Z_k)/\lambda)}{\sum_{i = 1}^K \exp((\log \alpha_i + Z_i)/\lambda)}. \end{align}

Hence, \(g_{\phi}(Z)\) is a Concrete random variable and we can write:
\begin{align}
\E_{X \sim \mathrm{Concrete}}[f(X)] &= \E_{Z \sim \prod_k \mathrm{Gumbel}(0, 1)}[f(g_{\phi}(Z))].
\end{align}
Hence the derivatives of the objective will be:
\begin{align}
\frac{\partial}{\partial \phi} \E_{X \sim \mathrm{Concrete}}[f(X)]
&= \frac{\partial}{\partial \phi} \E_{Z \sim \prod_k \mathrm{Gumbel}(0, 1)}[f(g_{\phi}(Z))] \\

&= \E_{Z \sim \prod_k \mathrm{Gumbel}(0, 1)}\left[\frac{\partial}{\partial \phi} \left(f(g_{\phi}(Z))\right)\right].
\end{align}
This can be approximated by the Monte Carlo estimator:
\begin{align}
&\frac{1}{N} \sum_{n = 1}^N \frac{\partial}{\partial \phi} \left(f(g_{\phi}(Z^n))\right) \\

Z^n &:= (Z_1^n, \dotsc, Z_K^n) & n = 1, \dotsc, N \\

Z_k^n &\sim \mathrm{Gumbel}(0, 1) & k = 1, \dotsc, K, n = 1, \dotsc, N.
\end{align}

**References**

- Maddison, C. J., Mnih, A., & Teh, Y. W. (2016). The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables.
*ArXiv Preprint ArXiv:1611.00712*.@article{maddison2016concrete, title = {The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables}, author = {Maddison, Chris J. and Mnih, Andriy and Teh, Yee Whye}, journal = {arXiv preprint arXiv:1611.00712}, year = {2016} }

- Jang, E., Gu, S., & Poole, B. (2016). Categorical Reparameterization with Gumbel-Softmax.
*ArXiv Preprint ArXiv:1611.01144*.@article{jang2016categorical, title = {Categorical Reparameterization with Gumbel-Softmax}, author = {Jang, Eric and Gu, Shixiang and Poole, Ben}, journal = {arXiv preprint arXiv:1611.01144}, year = {2016} }

[back]