Learn patterns from unlabeled data through unsupervised learning.

It works by simplifying the original data sample into a representation (Encoder → embedding into vectors), then trying to reconstruct the original state (Decoder) and lastly comparing the two. The difference between them is called the reconstruction error.

There is no structure in random data, simplifying it would not be possible. But real-world data is considered constrained.

We force the model’s encoder and decoder to “work together” by simplifying the data so much that there necessarily is information loss.

Denoising: purposefully adding noise to the input but training against original (non-noisy) samples

  • Feature Extractor: Train together, then cut off decoder to see which features are actually relevant

  • Anomaly Scoring: Finding outliers by looking at the reconstruction error → if the model hasn’t seen this pattern yet, it can’t reliably find a solution

  • Missing Value Imputation: Denoising

  • Keywords: machine-learning

  • Source: Simple Explanation of AutoEncoders - YouTube

  • Related: Unsupervised Learning