# REINFORCE trick

07 November 2016

This is a note about a Monte Carlo estimation method under various names: REINFORCE trick (Williams, 1992), score function estimator (Fu, 2006), likelihood-ratio estimator (Glynn, 1990).

Consider a random variable $$X: \Omega \to \mathcal X$$ whose distribution is parameterized by $$\phi$$; and a function $$f: \mathcal X \to \mathbb R$$. The goal is to approximate $$\frac{\partial}{\partial \phi} \E[f(X)]$$.

Let $$p_{\phi}(x)$$ be the density of $$X$$ with respect to the base measure $$\mathrm dx$$. Using the identity $$\frac{\partial}{\partial \phi} p_{\phi}(x) = p_{\phi}(x) \frac{\partial}{\partial \phi} \log p_{\phi}(x)$$, we get: \begin{align} \frac{\partial}{\partial \phi} \E[f(X)] &= \frac{\partial}{\partial \phi} \int_{\mathcal X} f(x) p_{\phi}(x) \,\mathrm dx \\
&= \int_{\mathcal X} f(x) \frac{\partial}{\partial \phi} p_{\phi}(x) \,\mathrm dx \\
&= \int_{\mathcal X} f(x) \frac{\partial}{\partial \phi} \log p_{\phi}(x) p_{\phi}(x) \,\mathrm dx \\
&= \E\left[f(x) \frac{\partial}{\partial \phi} \log p_{\phi}(x)\right]. \end{align}

Hence, $$\frac{\partial}{\partial \phi} \E[f(X)]$$ can be approximated by a Monte Carlo estimator: \begin{align} \frac{1}{N} \sum_{n = 1}^N f(X^n) \frac{\partial}{\partial \phi} \log p_{\phi}(X^n) && X_n \sim p_{\phi}, n = 1, \dotsc, N. \end{align}

Thus, we only need $$\log p_{\phi}(x)$$ to be differentiable with respect to $$\phi$$. This estimator applicable to a wide range of distributions of $$X$$ but suffers from high variance (why?).

References

1. Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3-4), 229–256.
@article{williams1992simple,
title = {Simple statistical gradient-following algorithms for connectionist reinforcement learning},
author = {Williams, Ronald J},
journal = {Machine learning},
volume = {8},
number = {3-4},
pages = {229--256},
year = {1992},
publisher = {Springer}
}

2. Fu, M. C. (2006). Gradient estimation. Handbooks in Operations Research and Management Science, 13, 575–616.
@article{fu2006gradient,
author = {Fu, Michael C},
journal = {Handbooks in operations research and management science},
volume = {13},
pages = {575--616},
year = {2006},
publisher = {Elsevier}
}

3. Glynn, P. W. (1990). Likelihood ratio gradient estimation for stochastic systems. Communications of the ACM, 33(10), 75–84.
@article{glynn1990likelihood,
title = {Likelihood ratio gradient estimation for stochastic systems},
author = {Glynn, Peter W},
journal = {Communications of the ACM},
volume = {33},
number = {10},
pages = {75--84},
year = {1990},
publisher = {ACM}
}


[back]