Tuan Anh Le

variational lossy autoencoder

19 April 2017

notes on (Chen et al., 2017).

understanding: 6/10
code: ?


the problem that this paper is tackling is often refered to as the optimization challenges of VAEs:

when the decoder \(p_{\theta}(y \given x)\) is too expressive, the encoder \(q_{\phi}(x \given y)\) just learns the prior \(p_{\theta}(x)\) instead of the posterior \(p_{\theta}(x \given y)\).

this is a problem since VAEs won’t autoencode and the latents are meaningless.

information theory

an argument using code length of the joint code \((x, y)\) is used. i don’t quite get it but the bottom line is:


we need a decoder \(p_{\theta}(y \given x)\) such that

example: if we don’t want to include info about texture, force the decoder to learn the texture (e.g. using pixelcnn that can only see locally) then encoder will be forced to learn the other things, like global shapes

bottom line: if we want to encode something, make sure our decoder can’t possibly decode that something just by itself.

normalizing flows

make the prior powerful by using normalizing flows. then use decoders that can only capture local variations.


beats everything.


  1. Chen, X., Kingma, D. P., Salimans, T., Duan, Y., Dhariwal, P., Schulman, J., Sutskever, I., & Abbeel, P. (2017). Variational Lossy Autoencoder. International Conference on Learning Representations (ICLR).
      title = {Variational Lossy Autoencoder},
      author = {Chen, Xi and Kingma, Diederik P. and Salimans, Tim and Duan, Yan and Dhariwal, Prafulla and Schulman, John and Sutskever, Ilya and Abbeel, Pieter},
      year = {2017},
      booktitle = {International Conference on Learning Representations (ICLR)}