
Visual scenes are often comprised of sets of independent objects. Yet, current vision models make no assumptions about the nature of the pictures they look at.
Yannic Kilcher explore a paper on object-centric learning.
By imposing an objectness prior, this paper a module that is able to recognize permutation-invariant sets of objects from pixels in both supervised and unsupervised settings. It does so by introducing a slot attention module that combines an attention mechanism with dynamic routing.
Content index:
- 0:00 – Intro & Overview
- 1:40 – Problem Formulation
- 4:30 – Slot Attention Architecture
- 13:30 – Slot Attention Algorithm
- 21:30 – Iterative Routing Visualization
- 29:15 – Experiments
- 36:20 – Inference Time Flexibility
- 38:35 – Broader Impact Statement
- 42:05 – Conclusion & Comments