If you are here, you probably know variational autoencoders are a different animal than vanilla autoencoders. It took me a while to get my head around them. The number of blog posts on the topic is itself evident that I wasn't alone in my struggle. What really helped me understand them, was this lecture by Ali Ghodsi of University of Waterloo. He does an excellent job explaining the variational method and deriving the mathematics step by step. But he doesn't get to training and “reparameterization trick”.
Since the network has random sampling at heart, it is at first not obvious how to do error backpropagation. “reparameterization trick” is a clever way of getting random sampling out of the error backpropagation path. To understand it, just compare the two graphs on the hero image on top of this page, after you've watched Ali Ghodsi's lecture.
I guess you are here because I promised an interactive demo, so let's get to it.
First a little bit about what you are seeing.
In my quest to understand VAEs, I coded one up from strach using PyTorch. (GitHub link here) My script has a little function that saves model parameters (i.e. weights and biases) in a JSON file. I've used it to save the model parameters after training. Those parameters are now loaded into Deeplearn.js variables, and are used to create the interactive visualizations in this demo. So the VAE is running in your browser!
The 10 sliders represent the z (or latent variables) that are fed to the decoder network. Normally, they are generated by sampling from a distribution whose parameters are spit out by the encoder network. But here, you can also move the sliders to see digits generated by the decoder network morph into other digits. This gradual morphing of digits, is a feature of VAE that is the result of a smooth latent space which is enforced by random sampling from a distribution.
The 10 little tiles you see at the bottom left are example reconstructed images. When you click on them, a new z is sampled from the same distribution that the image was sampled from. Then the slider positions are set to the newly sampled value of z, and a new image is generated by the decoder network. So by clicking a digit repeatedly, you'll see new variations of it being generated. This is why VAEs are called generative models.
The more fun part is drawing the input yourself. When you use the drawing pad, your drawing is first encoded using the encoder network. The encoder network generates 10 pairs of \(\mu\) and \(\sigma\). Here for simplicity, instead of sampling from the distributions, I throw out the \(\sigma\) and pass \(\mu\) directly to the decoder network.
Try drawing some digits, and look at how the reconstructed image changes as you are drawing. You can try to draw things other that digits. But you'll see it has difficaulty reconstructing them, or the reconstructed image will look like a digit.
Have fun playing around, and share your observations :)
P.S. the hero image is taken from the Tutorial on Variational Autoencoders paper which I highly recommend.