2 minute read

Paper Title Generative Adversarial Nets
Date 2014-06
Link https://arxiv.org/abs/1406.2661

Paper summary

GAN is better than VAE (Variational Auto Encoder) because:

  • For VAE, the reconstruction loss (e.g. MSE) is a terrible objective function. Images that look the same can have wildly different pixel values.

Paper reading notes

Spontaneous Questions

Q1: What are the ways to train this model/keep D from corrupting G?
A1:

Q2: Deep generative models run into the difficulty of intractable probabilistic computations in MLE.
A2:

Paper Review

Summary

The paper proposes a framework for estimating generative models via an adversrial process, in which 2 models are trained: a generative model G and a discriminative model D. To side-step the intractability of probability estimation calculations in MLE fitting, GAN train G not with gradient calculated from prediction <-> data loss, but with gradients from the discriminator. G and D must maintain a similar level of competency throughout the training process so that they don’t learn the trivial solutions. The paper present theoretical analysis and proof of convergence for the algorithm and highlight the potential of such framework for generative model.

Strengths

  • Innovative framework to train generative models, have many influences on other models later.
  • Thorough theoretical analysis and proof of convergence to back up for how the model work
  • Experimental comparison with other generative model for images at the time to highlight superiority

Weaknesses

  • Experimentation results were based on Gaussian Parzen window-based likelihood estimation, but this method was not the most stable there was.
  • Model architectures are simple (using MLP) while CNNs was already popular in 2014.

Reflection

Compared to other generative models, such as diffusion for image and autoregressive transformer model for text, GAN seem to be the only model that attempt to learn the distribution of the data directly that’s still around. The other two arguably more successful models (Diffusion and Autoregressive LMs) try to learn the distribution of the data given some priors instead. They are also directly supervised by the data instead of a discriminator net. Maybe time have proven those approaches are better, maybe GAN can still make a comeback someday.

Most interesting thought/idea from reading this paper

By building and training generative models, are we learning the fabric of reality itself? A religious person might think we are learning to do God’s work - Creation. It’s a silly idea, but somehow this is the most interesting idea I got from reading this paper.

Updated: