Review Diffusion Models
Paper Title | Denoising Diffusion Probabilistic Models |
Informal Name | Diffusion Models |
Date | 2020-07 |
Link | https://arxiv.org/abs/2006.11239 |
Paper summary
- Introduction
- There are several competitive approaches to image generative modeling, including: GAN (adversarial based), Variational Auto Encoder - VAEs (Latent space), Energy-based modeling.
- This paper shows that diffusion models are capable of generating high-quality sample
- Diffusion does not have competitive log-likelihood despite superior quality image
- Methodology
- Diffusion consist of forward (adding noise) and reverse process (denoising)
- The forward process, undersome constraints (e.g. Gaussian noise) has close-form solution thus Markov chain is no longer necessary for sampling -> Enable efficient training.
- For the reverse process, training a model to predict the noise added in 1 step is equivalent to using variational inference while greatly simplify the model and enable efficient training
- Experimental study
- Ablation: Predicting epsilon on the Simplified objective outperformed all other settings (predicting mu, full objective)
- Progressive coding: Images that share latent space produce similar images
- Interpolation:
- Pictures of 2 celebs x1, x2
- Add noise to each picture to collect the latent space: x1’ and x2’
- Interpolate between the 2 latent space (pixel-wise weighted average) => x3’
- Denoise x3’ to get a high-quality version of an “average” between the 2 celebrities
Paper Review
Short Summary
The paper describe an image generation process that learn the denoising function to ultimately generate a new image from scratch. Conceptually, the training begin with a forward process (data generation) that gradually add noise to an image over hundreds of steps, then a model is trained to learn the reverse process on the generated data. The paper described theoretical analyses as the basis for its methodology and experimental results that proves its superiority over contemporary methods. It also presented an exploration on how to condition the sampling process for various settings.
Strengths
- Effective novel techniques were introduced such as predicting epsilon, simplified loss that helps model performance improves significantly
- Great theoretical analysis as the basis for why denoising models could be so effective in generative modeling
- The use of FID score and Inception Score reflected advancements in evaluating generative models since GAN
- The code was released to public -> Helped the research community to replicate and build upon.
- The investigations into Progressive coding and Interpolation were both insightful and applicable to real-life usecases.
Weaknesses
- Didn’t include discussion on training details such as batch size, computational requirements
- Didn’t go into details of potential downstream usecases (conditioned with text, etc.)
Reflection
Diffusion model is a great work that has solid theoretical basis, have connections to multiple branch of generative modeling, and actually works very well in practice. My hypothesis is learning the incremental change (denoising model) is more tractable and easier to train than learning the whole distribution from scratch like GAN-based models.
Most interesting thought/idea from reading this paper
The sampling process somewhat resemble the process of an artist: begin with the shapes, then some rough strokes for each region of the painting, and finally add the details. Maybe that is why diffusion model has a good inductive bias for image generation.