# importance weighted autoencoders

*19 February 2017*

notes on (Burda et al., 2016).

## summary

understanding: 8/10

code: https://github.com/yburda/iwae

importance weighted autoencoders (iwae) improve variational autoencoders.
the main difference is in the loss function.
iwae uses:
\begin{align}
\mathcal L_K = \E_{q_{\phi}(x \given y)}\left[\log \frac{1}{K} \sum_{k = 1}^K w_k\right]
\end{align}
where \(w_k = p_{\theta}(x_k, y) / q_{\phi}(x_k \given y)\) and \(x_k \sim q_{\phi}(x \given y)\).
vae uses the same, but always with \(K = 1\).
both objectives are lower bounds of \(p_{\theta}(y)\).

iwae is better because:

- the lower bound is better as \(K\) is larger (and converges to \(p_{\theta}(y)\) as \(K \to \infty\)).
- iwae
*possibly* uses the neural network modelling capacity better (more active units).
- experimentally better estimate of \(p_{\theta}(y)\) (obtained by importance sampling with 5000 particles) than vae.

## references

- Burda, Y., Grosse, R., & Salakhutdinov, R. (2016). Importance Weighted Autoencoders.
*International Conference on Learning Representations (ICLR)*.
@inproceedings{burda2016importance,
title = {Importance Weighted Autoencoders},
author = {Burda, Yuri and Grosse, Roger and Salakhutdinov, Ruslan},
year = {2016},
booktitle = {International Conference on Learning Representations (ICLR)}
}

[back]