Review Diffusion Models

2 minute read

Paper Title	Denoising Diffusion Probabilistic Models
Informal Name	Diffusion Models
Date	2020-07
Link	https://arxiv.org/abs/2006.11239

Paper summary

Introduction
- There are several competitive approaches to image generative modeling, including: GAN (adversarial based), Variational Auto Encoder - VAEs (Latent space), Energy-based modeling.
- This paper shows that diffusion models are capable of generating high-quality sample
- Diffusion does not have competitive log-likelihood despite superior quality image
Methodology
- Diffusion consist of forward (adding noise) and reverse process (denoising)
- The forward process, undersome constraints (e.g. Gaussian noise) has close-form solution thus Markov chain is no longer necessary for sampling -> Enable efficient training.
- For the reverse process, training a model to predict the noise added in 1 step is equivalent to using variational inference while greatly simplify the model and enable efficient training
Experimental study
- Ablation: Predicting epsilon on the Simplified objective outperformed all other settings (predicting mu, full objective)
- Progressive coding: Images that share latent space produce similar images
- Interpolation:
  - Pictures of 2 celebs x1, x2
  - Add noise to each picture to collect the latent space: x1’ and x2’
  - Interpolate between the 2 latent space (pixel-wise weighted average) => x3’
  - Denoise x3’ to get a high-quality version of an “average” between the 2 celebrities

Paper Review

Short Summary

The paper describe an image generation process that learn the denoising function to ultimately generate a new image from scratch. Conceptually, the training begin with a forward process (data generation) that gradually add noise to an image over hundreds of steps, then a model is trained to learn the reverse process on the generated data. The paper described theoretical analyses as the basis for its methodology and experimental results that proves its superiority over contemporary methods. It also presented an exploration on how to condition the sampling process for various settings.

Strengths

Effective novel techniques were introduced such as predicting epsilon, simplified loss that helps model performance improves significantly
Great theoretical analysis as the basis for why denoising models could be so effective in generative modeling
The use of FID score and Inception Score reflected advancements in evaluating generative models since GAN
The code was released to public -> Helped the research community to replicate and build upon.
The investigations into Progressive coding and Interpolation were both insightful and applicable to real-life usecases.

Weaknesses

Didn’t include discussion on training details such as batch size, computational requirements
Didn’t go into details of potential downstream usecases (conditioned with text, etc.)

Reflection

Diffusion model is a great work that has solid theoretical basis, have connections to multiple branch of generative modeling, and actually works very well in practice. My hypothesis is learning the incremental change (denoising model) is more tractable and easier to train than learning the whole distribution from scratch like GAN-based models.

Most interesting thought/idea from reading this paper

The sampling process somewhat resemble the process of an artist: begin with the shapes, then some rough strokes for each region of the painting, and finally add the details. Maybe that is why diffusion model has a good inductive bias for image generation.

Share on

Twitter Facebook LinkedIn

Review Diffusion Models

Paper summary

Paper Review

Short Summary

Strengths

Weaknesses

Reflection

Most interesting thought/idea from reading this paper

Share on

You may also enjoy

Key Concepts Of Langchain

A Beginner Introduction To Ranking Model

Observation Vs Ground Truth And Why Data Analysts Are Important

Applying Sequence Classification To Grocery Data Using Product Embeddings