Tuan Anh Le

Concrete distribution

07 November 2016

Notes on the Concrete distribution (continuous relaxations of discrete distributions) which was introduced by (Maddison et al., 2016) and (Jang et al., 2016). Notes are based on the former.

Goal

Given a random variable \(X: \Omega \to \mathcal X\) with distribution \(p_{\phi}(x)\) parameterized by \(\phi\); and a function \(f: \mathcal X \to \mathbb R\). The goal is to approximate \(\frac{\partial}{\partial \phi} \E[f(X)]\) using the reparameterization trick (instead of the REINFORCE trick) when \(X\) is discrete, i.e. when \(\mathcal X\) is countable.

Concrete distribution

The support of a \(K\)-dimensional Concrete distribution is the simplex \begin{align} \Delta^{K - 1} = \left\{x \in \mathbb R^K: x_k \in [0, 1], \sum_{k = 1}^K x_k = 1\right\}. \end{align} A sample \(X = (X_1, \dotsc, X_K)\) from the Concrete distribution—parameterized by location parameters \(\alpha = (\alpha_1, \dotsc, \alpha_K) \in (0, \infty)^K\), and temperature \(\lambda \in (0, \infty)\)—can be generated as follows:

For \(k = 1, \dotsc, K\):

  1. Generate a sample \(G_k\) from a Gumbel distribution by setting \(G_k \leftarrow (-\log(-\log U_k))\) where \(U_k\) is a sample from \(\mathrm{Uniform}(0, 1)\).
  2. Set \begin{align} X_k \leftarrow \frac{\exp((\log \alpha_k + G_k)/\lambda)}{\sum_{i = 1}^K \exp((\log \alpha_i + G_i)/\lambda)}. \end{align}

The Concrete distribution is actually defined by its density (see equation (11) in (Maddison et al., 2016)). As \(\lambda\) is smaller, the approximation to the one-hot discrete random variable with unnormalized weights \(\alpha\) is more accurate in the sense of Proposition 1 in (Maddison et al., 2016).

Reparameterization trick

Let \(Z = (Z_1, \dotsc, Z_K)\). Let \(Z_k, k = 1, \dotsc, K\) be distributed from \(\mathrm{Gumbel}(0, 1)\), and the \(k\)th output of the function \(g_{\phi}: \mathbb R^K \to \mathbb R^K\) be: \begin{align} \left[{g_{\phi}(Z)}\right]_k := \frac{\exp((\log \alpha_k + Z_k)/\lambda)}{\sum_{i = 1}^K \exp((\log \alpha_i + Z_i)/\lambda)}. \end{align}

Hence, \(g_{\phi}(Z)\) is a Concrete random variable and we can write: \begin{align} \E_{X \sim \mathrm{Concrete}}[f(X)] &= \E_{Z \sim \prod_k \mathrm{Gumbel}(0, 1)}[f(g_{\phi}(Z))]. \end{align} Hence the derivatives of the objective will be: \begin{align} \frac{\partial}{\partial \phi} \E_{X \sim \mathrm{Concrete}}[f(X)] &= \frac{\partial}{\partial \phi} \E_{Z \sim \prod_k \mathrm{Gumbel}(0, 1)}[f(g_{\phi}(Z))] \\
&= \E_{Z \sim \prod_k \mathrm{Gumbel}(0, 1)}\left[\frac{\partial}{\partial \phi} \left(f(g_{\phi}(Z))\right)\right]. \end{align} This can be approximated by the Monte Carlo estimator: \begin{align} &\frac{1}{N} \sum_{n = 1}^N \frac{\partial}{\partial \phi} \left(f(g_{\phi}(Z^n))\right) \\
Z^n &:= (Z_1^n, \dotsc, Z_K^n) & n = 1, \dotsc, N \\
Z_k^n &\sim \mathrm{Gumbel}(0, 1) & k = 1, \dotsc, K, n = 1, \dotsc, N. \end{align}


References

  1. Maddison, C. J., Mnih, A., & Teh, Y. W. (2016). The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables. ArXiv Preprint ArXiv:1611.00712.
    @article{maddison2016concrete,
      title = {The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables},
      author = {Maddison, Chris J. and Mnih, Andriy and Teh, Yee Whye},
      journal = {arXiv preprint arXiv:1611.00712},
      year = {2016}
    }
    
  2. Jang, E., Gu, S., & Poole, B. (2016). Categorical Reparameterization with Gumbel-Softmax. ArXiv Preprint ArXiv:1611.01144.
    @article{jang2016categorical,
      title = {Categorical Reparameterization with Gumbel-Softmax},
      author = {Jang, Eric and Gu, Shixiang and Poole, Ben},
      journal = {arXiv preprint arXiv:1611.01144},
      year = {2016}
    }
    

[back]