1 minute read

Paper Title Affordances from Human Videos as a Versatile Representation for Robotics
Authors Bahl et al.
Date 2023-06
Link https://robo-affordances.github.io/resources/vrb_paper.pdf

Paper Review

Short Summary

The paper introduce a representation learning method and transfering methods that bridge the gap between deep-learning-based visual models and robotic tasks. The proposed representation is learning Point-of-contact and Post-contact trajectory from human-based ego-centric videos. The authors also describe methods to use the learned representation to boostrap 4 different tasks, centering around focusing the robot’s attention to a narrower set of point of contact and action spaces. Finally, the experiments show that the learned representation was able to out-perform current methods on most of the benchmark datasets.

Strengths

  • A novel, simple, yet effective method to bridging the gap between the success of visual models and the robotic tasks.
  • Leverage existing off-the-shelf tools to effectively collect labels, a lot of good design choices was made during this step (e.g. using GMM to collect the contact points; clever way to limit the data disparity between human-centric images and robot-centric images, etc.)
  • Comprehensive work from ideation to deployment and experiment in a real-life robotic setting.

Weaknesses

  • The affordance model was not described in enough details about the set up and training (e.g. what is the format of the output for the transformer-based trajectory network, they only mention “trajectory of length 5”).
  • They did not discuss in detailed the connection between their approach and other bridging approaches and why do they think theirs is superior.

Reflection

I had never thought of and didn’t realize about this gap between the ML models (vision, nlp) and the real-world problems, and how despite the recent success in deep learning not much has been transfered to the offline world. This paper remind me of this gap and the challenges in bridging it.

Most interesting thought/idea from reading this paper

The gap between ML model and real-world, physical problems, may create a lot of jobs for ML practitioners in the next few years.

Updated: