DINO - self-distillation with no labels

1 minute read

Paper Title	Emerging Properties in Self-Supervised Vision Transformers
Authors	Caron et al.
Date	2021-05
Link	https://arxiv.org/abs/2104.14294

Paper summary

Paper Review

Short Summary

The paper presented DINO - “a form of self-distillation with no labels”. The teacher network (T) and the student network (S) shared the same architecture, T use momentum to simulate an ensemble of Students, thus performs better thatn S. The authors claimed that balancing between centering and sharpening of T’s output is enough to avoid mode collapse and encourage convergence. They also conducted extensive ablation study and performance benchmark against SOTA models. DINO showed SOTA performance compared to other self-supervised models while enjoying a limited computational budget for pre-training.

Strengths

Kudos to the novel idea of self-distillation and actually making it work.
Computationally efficient, a lot more friendly to the research community compared to other pre-trained models.
Surprising out-of-the-box performance with linear classifier and KNN -> Is suitable for a lot of use-cases.
Detailed ablation study and discussion helps create some ideas on how the method works

Weaknesses

Although the paper did some ablation study on this, I find that their discussion on avoiding mode collapse inconclusive. Balancing centering and sharpening by tuning the sharpening hyperparameter is not a robust method.
The model leave a lot of hyperparameter to tune, thus, although they claim to have small computational training budget for 1 run, it is unclear how hyperparameter tuning was done.

Reflection

Self distillation seems to be a promising research direction. Even though this paper has described a lot of the intuition behind the method, there still seems to be a lot of aspects unexplored.

Most interesting thought/idea from reading this paper

DINO doesn’t seems to be using rotation, might be a good fit for our project of electronic assembling anomaly dectection.

Share on

Twitter Facebook LinkedIn

DINO - self-distillation with no labels

Paper summary

Paper Review

Short Summary

Strengths

Weaknesses

Reflection

Most interesting thought/idea from reading this paper

Share on

You may also enjoy

Key Concepts Of Langchain

A Beginner Introduction To Ranking Model

Observation Vs Ground Truth And Why Data Analysts Are Important

Applying Sequence Classification To Grocery Data Using Product Embeddings