19 April 2017
notes on (Chen et al., 2017).
understanding: 6/10
code: ?
the problem that this paper is tackling is often refered to as the optimization challenges of VAEs:
when the decoder \(p_{\theta}(y \given x)\) is too expressive, the encoder \(q_{\phi}(x \given y)\) just learns the prior \(p_{\theta}(x)\) instead of the posterior \(p_{\theta}(x \given y)\).
this is a problem since VAEs won’t autoencode and the latents are meaningless.
an argument using code length of the joint code \((x, y)\) is used. i don’t quite get it but the bottom line is:
we need a decoder \(p_{\theta}(y \given x)\) such that
example: if we don’t want to include info about texture, force the decoder to learn the texture (e.g. using pixelcnn that can only see locally) then encoder will be forced to learn the other things, like global shapes
bottom line: if we want to encode something, make sure our decoder can’t possibly decode that something just by itself.
make the prior powerful by using normalizing flows. then use decoders that can only capture local variations.
beats everything.
@inproceedings{chen2017variational,
  title = {Variational Lossy Autoencoder},
  author = {Chen, Xi and Kingma, Diederik P. and Salimans, Tim and Duan, Yan and Dhariwal, Prafulla and Schulman, John and Sutskever, Ilya and Abbeel, Pieter},
  year = {2017},
  booktitle = {International Conference on Learning Representations (ICLR)}
}
[back]