Imagen paper

1 minute read

Paper Title	Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Authors	Saharia & Chan et al.
Date	2022-05
Link	https://arxiv.org/pdf/2205.11487.pdf

Paper Review

Short Summary

Imagen is a text-to-image diffusion model that achieve great consistency with text prompt and image fidelity. There are 3 stages: Base text-to-image model and 2 super-resolution models. The base model use a frozen T5-XXL as its text encoder, and a UNet conditioned on the text embeddings. The super resolution models use Efficient Unets and Dynamic thresholding for sampling. The model achieve SOTA FID and best result on DrawBench, a new benchmark dataset introduced by Google.

Strengths

Can generate text in image very well, which is much better compared to DALLE 2 or DDPM
Simple architecture and easy to train
Dynamic thresholding is proven to help generate image with better text-alignment consistency and fidelity.
Nice demonstration of transfer learning from Text-only domain of T5 to Text-image domain

Weaknesses

Not perform so well with prompts that contain Counting, Positional relationship, possibly due to the limitation of the T5-XXL language model.
The description of methods and architecture is unituitive and lack details, making it harder for other researchers to understand and replicate results.
Doesn’t describe any of the training dataset statistics in any level of details, thus, make it very hard for the research community to understand the implication.

Reflection

It seems like the model demonstrated that text encoding doesn’t neccessary have to be aligned with Vision to generate good image. Put it another way, it is possible, even easier, to take frozen text embeddings and train the vision model to align with it.

Most interesting thought/idea from reading this paper

Most of the improvements in this paper is not outstanding, makes me think that, even for Image generation, computing resources are still the most important factor.

Share on

Twitter Facebook LinkedIn

Imagen paper

Paper Review

Short Summary

Strengths

Weaknesses

Reflection

Most interesting thought/idea from reading this paper

Share on

You may also enjoy

Key Concepts Of Langchain

A Beginner Introduction To Ranking Model

Observation Vs Ground Truth And Why Data Analysts Are Important

Applying Sequence Classification To Grocery Data Using Product Embeddings