Skip to content

Autoencoders

Brenda Huppenthal edited this page Apr 10, 2024 · 7 revisions

Structure

Autoencoders are composed of two neural networks, the encoder and the decoder. The encoder compresses the input into a lower-dimensional code. This code is a compact representation of the input, also known as the latent-space representation. The decoder reconstructs the original input from the code.

An autoencoder a self-supervised learning algorithm. Rather than train using labeled samples, autoencoders are trained by feeding an input to the encoder and obtaining a reconstruction of that input from the decoder which can be compared to the original input. The loss function is the reconstruction error. One example of reconstruction error would be the L2 loss between an input image and the reconstructed image. By forcing the information through a bottleneck of smaller dimensionality than the input, the encoder must learn a transformation to a code in the latent space that preserves informative features about the input while discarding uninformative features. The decoder must then learn a function from the code back up to the dimensionality of the input space.

If the bottleneck size is smaller than the size of the input, then the autocoder is undercomplete. If the bottleneck is larger than the number of inputs, the autoencoder is overcomplete and is capable of learning to copy the input to the output even if it is restricted to linear functions. As such, if the latent code size is too small, the autoencoder cannot capture enough information about the input to properly reconstruct it. If the latent code size is too large, the representation is not as compressed as it could be, and the network can retain irrelevant information to the task of encoding. Thus, the code size is a hyperparameter and can be tuned, and the process of training includes finding a useful compact representation.

The latent space is only capable of taking discrete values, and there may be gaps where latent codes do not create meaningful images.

Properties

  • data specific, only compress data similar to what they saw in the training set. If you train on MNIST and test on a picture of a cat, you would not expect the autoencoder to perform well.
  • lossy, because you compress down to a smaller dimension, cannot perfectly recover input, though the decoder is powerful enough to recover something close.
  • self-supervised, you do not need to tell it what to learn. can easily train a new autoencoder for new data by just choosing a useful loss function and hitting play.

Uses

Because autoencoders are a form of unsupervised learning, it can directly consume large amounts of unlabeled data. Then you can take the encoder, which has learned a transformation of the input down to a useful code, and use the code as the input for another task, like classification. The code should be useful for this task as well.

  • data denoising: we can train an autoencoder to denoise images by adding noise to the input images and then measuring the reconstruction loss against the original image
  • image reconstruction
  • image colorization
  • dimensionality reduction for data visualization: a neural network learning important features rather than deterministically extracting them through a method like PCA
  • data compression
Clone this wiki locally